GPFS and Remote Shell

GPFS and Remote Shell Yuri Volobuev GPFS Development Ver. 1.1, January 2015.

Abstract The use of a remote shell command (e.g. ssh) by GPFS is one of the most frequently misunderstood aspects of GPFS administration, to the point where it could be a barrier to GPFS adoption. There is much confusion around this topic. Is remote shell access the basic cost of GPFS access? Does GPFS have a hard dependency on SSH? Does GPFS require root-level passwordless SSH access between all nodes? My corporate IT security policy stipulates that PermitRootLogin must be set to No this means I can t run GPFS, right? The short answer to all of those questions is No, while the long answer requires some explaining. Background Being a file system, GPFS needs elevated privileges to operate. A portion of GPFS code runs in the kernel space, and thus has the same level of access to the system as the OS kernel (i.e. the highest level of access possible). The userspace part of GPFS code also needs to be able to perform many operations that require elevated privileges: communicating with the kernel counterpart, loading and unloading kernel modules, mounting and unmounting file systems, modifying system configuration files (e.g. /etc/fstab and entries under /dev), accessing raw disk devices, etc. On a standard AIX or Linux install, this requires root-level access. This is a hard requirement that cannot be easily changed. So GPFS administration commands, generally known as mm commands, require root privileges to run, with a few exceptions. A critical point that must be appreciated is: the core GPFS design is based on the trusted kernel assumption. GPFS is a cluster file system, not a client-server setup like NFS. Each GPFS node is capable of performing a full range of file system operations independently. This means that the kernel on each GPFS node has to be trusted to do the right thing. A user with root access on any GPFS node has full access to all file system data and metadata, and a malicious root user would be able to wreak havoc on any GPFS file system. Being a cluster file system, GPFS has a need to reach out to other nodes in the cluster. Different layers of GPFS code do this, using different communication channels. The main GPFS daemon process, mmfsd, uses an RPC mechanism to communicate with mmfsd processes running on other nodes. GPFS admin commands use remote shell, such as RSH or SSH, to execute various commands on other nodes. The rationale for this architecture has roots in early GPFS history, and the details of the remote shell use have evolved over time. History When GPFS was first released as a product in 1997, it was not a standalone piece of software, but rather a component in the IBM SP software stack. There was some common infrastructure in the stack that GPFS code was using for its needs. One basic need that GPFS has is bootstrapping: managing basic configuration covering things like cluster membership, defined file systems, disks belonging to GPFS, etc. This configuration data must be available (and be up to date) on all nodes in the cluster, and meet the basic clustering requirements: high availability, transactional semantics, and scalability for larger clusters. On IBM SP, for bootstrapping purposes GPFS was using a common infrastructure component known as

System Data Repository, or SDR. For general cluster administration, and for GPFS administration in particular, the remote shell of choice was RSH (those were simpler times). Over the course of the following years, GPFS has evolved into a standalone software product. An alternative mechanism was implemented for managing bootstrapping configuration data. The data was stored in a text file, known as mmsdrfs (in homage to the IBM SP SDR roots), and the master copy of the file was managed by (typically) a pair of nodes known as configuration manager nodes (a primary and a backup). Whenever configuration changed, to provide proper transactional semantics, a carefully orchestrated multi-phase commit operation would be carried out under the covers by GPFS admin code, using rsh and rcp commands. Once committed, an updated mmsdrfs file would be pushed out to the rest of the nodes in the cluster, again using rsh and rcp. In turn, other nodes in the cluster could pull an up-to-date copy of mmsdrfs using rsh and rcp (which can be needed, for example, if a node was down at the time of the configuration change). This meant that a remote shell connection to and from a configuration manager node may be needed pretty much at any time, for any node. At this point in time, the configuration requirements for GPFS were: any node in the cluster must be able to execute remote commands as root over rsh on all nodes in the cluster. Clearly, this isn t what a security-conscious sysadmin would like to see, but again those were simpler times. More years have passed. Gradually, an understanding has set in that the RSH protocol is woefully insecure, and SSH rose to prominence as a more secure alternative. There was nothing in GPFS code that specifically required the use of rsh and rcp as such, and using ssh and scp as drop-in replacements was a simple step. The pathnames of remote shell command and remote copy command have become cluster configuration parameters. The use of SSH with GPFS has become a de facto standard (although some souped-up forms of RSH, e.g. Kerberos-enabled varieties, are still in use). However, the way remote shell and copy commands are called from GPFS hasn t changed, and it remains very general, and not specific to any particular remote shell command implementation. There s no hard requirement for ssh and scp as such. The use of SSH, combined with the growth of GPFS cluster sizes, has created new problems. When a large cluster is brought up, all nodes would initiate SSH connections to one of the configuration manager nodes, to verify that their copy of mmsdrfs is up-to-date. It turned out that handling a surge of incoming SSH connections is something that sshd has trouble with, in particular on larger clusters. While tuning could ameliorate the problem somewhat, it was clear that a more scalable solution was needed. So mmsdrserv was implemented. At that point in time, mmsdrserv was a small, lightweight daemon that handled a few simple tasks related to mmsdrfs management, using custom RPCs over TCP/IP sockets: for example, get the current version number of mmsdrfs, fetch the body of mmsdrfs to a client. At that point, the all-to-all remote shell requirement was a source of significant consternation among GPFS users, for obvious reasons: if a single node in the cluster is compromised, the entire cluster is automatically compromised. Some way to tighten up the remote shell access requirements was needed, and the use of mmsdrserv has offered an opportunity to do just that. In GPFS V3.3, significant changes were made to the way admin commands operate. A new configuration parameter was introduced:

adminmode. The alltoall setting corresponded to the old way of doing things. The central setting allowed for a sharp reduction in the scope of remote shell access, as discussed in detail below. Another significant change to the GPFS administration model was multi-clustering: the possibility to mount a file system owned by a different cluster. In this model, several clusters can be set up that can be administered independently, with no need for command execution via the remote shell channel between them. This has provided another avenue for reducing the scope of remote shell use. In GPFS V4.1, a new mechanism for bootstrap configuration data management was introduced: Cluster Configuration Repository (CCR). When CCR is in use, once a cluster is created, the management of the master copies of configuration data is done entirely through an RPC mechanism, between mmsdrserv (or mmfsd) processes running on quorum nodes. What semantics does GPFS need from remote shell? When the adminmode = central setting is in use, the exact requirement towards remote shell/copy commands semantics reads: When a GPFS management command is executed, it must be able to execute commands remotely on all other nodes in the cluster using the configured remote shell command, without being prompted for a password on the command tty. Only the tty used to execute the management command needs to be authorized. So what s the rationale behind such precise wording? The intent here is to allow GPFS commands to perform administration tasks cluster-wide, but with a limited level of authorization. Only one tty on one node needs to be authorized, and only when a GPFS management task needs to be performed. GPFS won t try to run remote shell commands under the covers in this mode. Very importantly, without being prompted for a password on the command tty isn t equivalent to passwordless. This only means that the authentication needs to occur through a mechanism other than a tty prompt. One possible implementation that fits this model well is using SSH authorized_keys public/private key framework for granting trust, with the private key protected by a passphrase, and ssh-agent used for password prompting and password caching. It is common to use a special hardened sysadmin-only node for performing all GPFS management tasks, and only authorize this node for batch-mode SSH access. It is important that the remote shell command operates in the batch (or promptless) mode: no prompting for input and no extraneous output on the command tty. GPFS code passes -n (redirect stdin from /dev/null) switch to the remote shell command, so supplying input directly on the command tty is not possible, and the use of this option is essential to proper parsing of remote command output. It is perfectly fine if the remote shell command obtains authorization by prompting for a password (or passphrase) through an external channel, e.g. an X11 window, or reuses an authentication token from a pre-authorization operation. In a multi-homed environment, i.e. a configuration where multiple network interfaces are defined on a node, a question naturally arises: which network interfaces will GPFS use, in particular for running remote shell commands? Only admin interfaces will be used for that purpose. By default, the admin

interface is the interface corresponding to the hostname passed to mmcrcluster or mmaddnode command when adding a node to the cluster. It is possible to specify a different admin interface using mmchnode. Remote shell connections only need to be authorized for the admin interface, not any other interfaces that may be defined on a node. Other parts of GPFS, in particular the mmfsd daemon, may use other interfaces, if configured to do so, but that will not involve the use of a remote shell. What else is possible? So what can one do if the remote shell semantics explained above aren t acceptable? For example, what if PermitRootLogin must be disabled per corporate security policy, with no exceptions allowed? Does this rule out using GPFS? Not necessarily. An important point to remember is that GPFS allows using any pair of commands that provide the general semantics of rsh and rcp. While the ssh and scp pair is the most obvious candidate, the playing field is not restricted to those two. One potentially productive approach is to implement a pair of wrapper commands that provide the expected semantics externally, and internally do whatever it takes to get the job done. This may involve using a customdesigned communication tunnel, an exotic authentication method, or any combination of things. For the specific problem of PermitRootLogin, one possible approach is to leverage sudo, or a sudo-like framework, for privilege manipulation. It is possible to kick off a GPFS admin command using sudo, and then have the wrappers use ssh to log in to remote node using a non-root ID, and then use sudo on the remote side to execute the necessary commands. PermitRootLogin can be set to No in this scenario. It is still necessary to allow promptless remote command access for the user ID in question, and sudo must allow for promptless execution of a few commands for this ID. A sample of sudo-based wrappers is available on request by contacting gpfs@us.ibm.com. In those situations where no form of promptless remote shell access is possible on a given node, it is still possible to mount a GPFS file system that is exported from a different cluster. The obvious disadvantage here is the disjoint system administration model: a unit of GPFS administration is a single cluster, so if multiple clusters are defined, each needs to be administered separately. However, in certain cases this may be a fair tradeoff for not requiring remote shell access. Summary GPFS uses a fairly flexible framework for performing administrative tasks. This framework has evolved substantially from its early implementation, and some of the preconceived notions about GPFS requirements towards remote shell configuration are not true anymore. It is possible to configure GPFS to run in a wide variety of system configurations.