White Paper. Managing MapR Clusters on Google Compute Engine

Transcription

1 White Paper Managing MapR Clusters on Google Compute Engine

2 MapR Technologies, Inc. Introduction Google Compute Engine is a proven platform for running MapR. Consistent, high performance virtual machines coupled with a high bandwidth, low latency network linking them together and to the rest of the Google Cloud Platform services deliver a solid foundation for cloud-based data processing architectures. The ability to quickly instantiate numerous virtual machines on demand and the availability of per-minute pricing make Compute Engine well-suited and cost-effective for spinning up and turning off ad hoc clusters. Further, Compute Engine s advanced routing and data encryption features together with Google s global network performance enable the construction of secure, compelling hybrid solutions. This paper presents several techniques for those who wish to manage their own MapR installations on Google Compute Engine, and select scenarios (migration across zones, disaster recovery and high availability) that arise when dealing with long-lived clusters and operating across multiple zones. This list is neither exhaustive nor authoritative as other solutions certainly exist and might even be more applicable to specific situations. This paper, however, illustrates how the same features that make Compute Engine a powerful platform upon which to run MapR can provide the basis for MapR cluster management solutions. Scenarios The scenarios presented in this section are all based on the premise that the cluster under consideration ought to be preserved to the greatest extent possible when faced with an adverse environmental event (for example, a zone is shut down for maintenance or experiences an unanticipated outage). This is a common situation for long-lived clusters used in support of job pipelines and stream processing. On-demand clusters 1 require little or no management and are, generally speaking, outside the scope of this document. Further, the scenarios presented assume that data consumed and generated by the cluster needs to reside durably in a self-managed MapR-FS (MapR File System). 2 Administrators can plan accordingly for maintenance events and manage cluster availability. However, should an unexpected situation occur, jobs currently being processed can be impacted. While specific outcomes vary, in some cases, these jobs are restarted rather than resumed, though there is a possibility that jobs will need to be resubmitted. Zone Migration At some point, it may become desirable or necessary to move or clone a cluster from one zone to another. This section presents two different zone migration scenarios: 1. Minimize the unavailability of the cluster to accept and run MapReduce jobs, at the expense of operational complexity. 2. Trade additional unavailability in favor of significantly simpler management. 1 Google Compute Engine s fast virtual machine creation and sub-hour billing model enable customers to perform data processing using ad hoc clusters, without the overhead of managing resources constrained by timeboxed pricing. 2 Relaxing this constraint enables one to take advantage of the preferred approach of leveraging Google Cloud Storage as the durable repository.

3 2 Zone Migration Scenario 1: Minimizing Unavailability Many MapR clusters are integral parts of a business s daily operations. Any time users cannot submit work can be costly. This first approach attempts to reduce this duration as much as possible in a traditional deployment 3. First, new FileServers and TaskTrackers are added to the cluster in the destination zone, and then the Container Location Database (CLDB) and JobTracker master nodes are switched over as well. This method bounds the cluster s unavailability to roughly the interval defined from the shutdown of the master services in the source zone to their startup in the destination zone. Considerations FileServers may use either (or both) scratch or persistent disks for MapR-FS. Persistent disks offer more storage space 3 as well as durability beyond the lifetime of an instance. With persistent disks, administrators have the option of reducing the MapR-FS replication factor while protecting data in the event of a zone becoming unavailable. Using persistent root disks enables any node to survive an unexpected reboot. This scenario makes use of MapR rack topologies to enable the cluster to continue running while nodes are decommissioned. 4 Topology describes the locations of nodes and racks in a cluster. The MapR software uses node topology to determine the location of replicated copies of data. Optimally defined cluster topology results in data being replicated to separate racks, providing continued data availability in the event of rack or node failure. Data is copied across zones when the Container Location Database (CLDB) distributes replicated data containers on separate racks. A rack is a logical collection of nodes configured with FileServers and TaskTrackers running in the same zone. Figure 1: Minimizing unavailability during zone migration 3 As of July 2013, up to 10TB of persistent disk storage can be mounted on standard instance types compared to 3.5TB maximum scratch disk space. 4 Refer to Decommissioning a Node for more details on decommissioning nodes.

4 3 Zone Migration Scenario 1: Minimizing Unavailability Approach 1. Run the configure.sh script to configure new nodes. The nodes are automatically added into the MapR cluster when Warden (the service that starts and stops other services in a cluster) starts the services on the new node. You can create new instances for these nodes programmatically using the gcutil utility, REST API or Google client libraries, or manually via the Google Cloud Console. The new nodes should identify themselves with a pair of new rack identifiers, different from those in the source zone. By organizing nodes in different topologies, it can be established that only one copy of any data container will be removed from the zone when a rack is decommissioned. MapR-FS replication ensures that a copy of every data container will exist on at least two different racks and that every data container in the decommissioned rack will be available on a rack in the destination zone 2. Migrate CLDB and JobTracker. There are several ways to implement this step. The automatic approach is recommended if persistent root and data disks have been used as this greatly simplifies the process. The manual approach may be necessary when performing custom configurations and/or moving data from scratch disks. Automatic Approach a) Shut down the cluster. b) Use the gcutil moveinstances command to migrate the nodes into the destination zone. This command copies instance configurations, takes snapshots of the persistent disks, deletes the existing instances, and then recreates the instances in the destination zone. Once the new instances are up and services are running, the cluster is available to accept jobs. Manual Approach a) Shutdown the cluster. b) If persistent root and/or data disks have been used, create snapshots for later use. c) If any data residing on scratch disk must be preserved, take either of the following actions: i. Create and attach a new persistent disk, copy the data from the scratch disk, detach the persistent disk and create a snapshot. ii. Store the data in Google Cloud Storage. For example, use the following command to stream a compressed archive of the src directory directly into a bucket: tar zcf - <src> gsutil cp - gs://<bucket>/src.tar.gz d) Terminate the nodes in the source zone. e) If applicable, create new persistent root and data disks from snapshots created previously.

5 4 Zone Migration Scenario 1: Minimizing Unavailability f) Use the new persistent root and data disks created in the previous step to create instances for the nodes in the destination zone. g) If data has been temporarily stored in Google Cloud Storage, copy the data from Google Cloud Storage to the new instances. h) Start Warden and ZooKeeper. Warden will start up the master services on the new nodes. Once the new instances are up and services are running, the cluster can accept jobs. 3. Decommission one rack of nodes and TaskTrackers. Move nodes assigned to racks in the source zone to an isolated topology and decommission 4 them using following steps: c) Blacklist TaskTrackers, and wait for the blacklist to finish. d) Move the node to the /offline or /decommissioned topology. e) When the node no longer has non-local data (the data has been drained ), shut down the FileServer service. f) Remove the node from the cluster using the maprcli node remove command. 4. Repeat step 3 for all racks.

6 5 Zone Migration Scenario 2: Simplifying Management This scenario presents how to perform a migration with a single command, easing migration operations at the possible expense of longer cluster downtime. This process is nearly identical to the automatic migration of the Container Location Database (CLDB) and JobTracker instances, as described in scenario 1. gcutil moveinstances migrates the entire cluster between zones. Considerations Use persistent disks when adding data disks to nodes for MapR-FS (MapR File System). Figure 2: Simplifying management for zone migration Approach Migrate the cluster. Use the gcutil moveinstances command to clone all of the nodes in the cluster and their persistent disks into the destination zone. For example, consider a small cluster with a single master node (mapr-maprm) and ten worker nodes (mapr-maprw-000,, mapr-maprw-009). The following single call will move from the us-central2-a zone and restart the cluster in uscentral1-a: gcutil --project=<project> moveinstances mapr-maprm mapr-maprw-00\\ d+--source _ zone=us-central2-a --destination _ zone=us-central1-a

7 6 Disaster Recovery Zone-to-zone migration is an effective way to gracefully plan for and manage anticipated zone maintenance windows and other drivers of cluster relocation. However, it is always a good practice to account for the unexpected; catastrophic zone-wide failures are extremely rare but can occur 5. You can deploy a zone across multiple zones for disaster recovery. Multi-zone Cluster When you deploy a cluster across multiple zones 6, commission another set of FileServers and TaskTrackers in a second zone. Add a standby Container Location Database (CLDB) and JobTracker to the second zone for failover. Considerations This pattern requires twice the number of FileServers and TaskTrackers than would otherwise be deployed. It also requires substantial cross-zone communication. While this pattern results in twice the processing capability, operational (instance, storage and network) expenses will increase and overall performance could be degraded, especially if data is accessed across zones. MapR provides high availability for the Container Location Database (CLDB) and JobTracker. Figure 3: A multi-zone cluster for disaster recovery 5 Google Compute Engine Service Level Agreement. 6 This option is presented for sake of completeness and is not typically recommended because it incurs additional costs and may impact performance.

8 7 Scenarios: Disaster Recovery Approach 1. Distribute the cluster. FileServers and TaskTrackers are spread equally across both zones and fully commissioned. The FileServers within a zone should identify with the same rack and each zone should use a different rack identifier, ensuring that at least one copy of each data container exists in the second zone. 2. Failover the CLDB. The CLDB can be restored in the second zone automatically. 3. Failover the JobTracker. The JobTracker can be restored automatically in the second zone. High Availability MapR is the only distribution with high availability at the cluster and job levels to eliminate any single points of failure across the system. MapR distributes metadata across the cluster to avoid bottlenecks and improve cluster performance. The JobTracker in MapR has high availability. If the service crashes, a new JobTracker automatically picks up where the original JobTracker stopped, and the MapReduce can continue without any restart or intervention. If the node itself crashes, a new node automatically takes over and continues the process. Google Compute Engine Network DNS One of the benefits of Google Compute Engine is the ability to address an instance on the network in any zone in any region by its user-provided hostname. Additionally, these names can be reused when instances are recycled (turning down one instance and spinning up another with the same name), which is convenient for deployments that want to rely on name-based addressing. It is also worth noting that there are no guarantees around internal IP address assignment. That instance hostnames can be reused dynamically is due to the fact that DNS entries are cleaned up almost immediately after an instance is terminated. ZooKeeper The MapR cluster uses Apache ZooKeeper to coordinate services and enables high availability (HA) and fault tolerance for MapR clusters. The Warden will not start any services unless ZooKeeper is reachable and more than half of the configured ZooKeeper nodes (a quorum) are live. It s worth noting that special considerations need to be made with regards to DNS when deploying ZooKeeper for MapR on Google Compute Engine. Currently, ZooKeeper resolves DNS names once at startup. This prevents the replacement of a member instance without rebooting the remaining members of the ensemble. So given the previous description of the behavior of Google Compute Engine DNS, it is worth noting that any time a node running ZooKeeper is rebooted or replaced, all other ZooKeeper services must be restarted. Failure to do so will leave the existing ensemble with one less member, and the new instance will be running what will amount to an orphaned ZooKeeper instance.

9 8 Straddling Multiple Zones High availability can also be considered in terms of resilience of the cluster to zone failure. This might be applicable in scenarios where any interruption of long running jobs or pipelines might be detrimental and/or incur substantial costs. Consider the scenario where zone A is scheduled for maintenance earlier than zone B. Given this knowledge, it is possible to construct a cluster spanning both zones such that it will continue to run in the event that either goes down. Figure 4: Straddling multiple zones for high availability 1. Distribute the cluster. As previously addressed, FileServers and TaskTrackers are split equally across both zones and commissioned as part of the MapR-FS cluster; multiple MapR-FS racks are used to ensure data is fully replicated. 2. Distribute the ZooKeeper ensemble. The majority of ZooKeeper nodes must be deployed in zone B along with the standby CLDB and the active JobTracker. 3. Manage the CLDB. If zone A becomes unavailable, the ZooKeeper ensemble still maintains a quorum and can automatically facilitate the failover and promotion of the standby CLDB to active. If instead, zone B experiences an outage, the active CLBD will continue to run. The cluster, however, will not be able to sustain an active node failure as the ensemble has no quorum. Replacement ZooKeeper nodes must be added and the ensemble must be restarted. 4. Manage the JobTracker. The active JobTracker is deployed to zone B, so that if zone A becomes unavailable (as anticipated), any active jobs can continue running until completion.

10 9 Conclusion Google Cloud Platform not only offers a high performance platform upon which to run MapR, but also provides features and tools that can assist in the maintenance of clusters across zones to keep business-critical jobs running in zone migration, disaster recovery, and high availability scenarios. Persistent disks, Google Cloud Storage, and the network infrastructure enable efficient data and instance migration; and gcutil provides a rich set of commands to help accomplish cluster management tasks. Additional Resources MapR, Hive, and Pig on Google Compute Engine Google Cloud Platform solution for more on how to take advantage of Google Compute Engine, with support from Google Cloud Storage, to run a selfmanaged MapR cluster with Apache Hive and Apache Pig as part of a Big Data processing solution. Google-compute-engine-cluster-for-hadoop a sample application to assist in setting up Hadoop compute clusters and executing MapReduce tasks. Please note that this application does not perform any of the cluster management outlined in this paper. MapR delivers on the promise of Hadoop with a proven, enterprise-grade platform that supports a broad set of mission-critical and real-time production uses. MapR brings unprecedented dependability, ease-of-use and world-record speed to Hadoop, NoSQL, database and streaming applications in one unified big data platform. MapR is used by more than 500 customers across financial services, retail, media, healthcare, manufacturing, telecommunications and government organizations as well as by leading Fortune 100 and Web 2.0 companies. Amazon, Cisco, Google and HP are part of the broad MapR partner ecosystem. Investors include Lightspeed Venture Partners, Mayfield Fund, NEA, and Redpoint Ventures. MapR is based in San Jose, CA. Connect with MapR on Facebook, LinkedIn, and Twitter MapR Technologies. All rights reserved. Apache Hadoop, HBase and Hadoop are trademarks of the Apache Software Foundation and not affiliated with MapR Technologies.