Couchbase Server Technical Overview. Key concepts, system architecture and subsystem design

Similar documents
Couchbase Server Under the Hood

membase.org: The Simple, Fast, Elastic NoSQL Database NorthScale Matt Ingenthron OSCON 2010

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings

Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Apache Hadoop. Alexandru Costan

Availability Digest. Redundant Load Balancing for High Availability July 2013

Scala Storage Scale-Out Clustered Storage White Paper

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk


High Availability with Windows Server 2012 Release Candidate

Fault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together

HDFS Architecture Guide

The Sierra Clustered Database Engine, the technology at the heart of

Hypertable Architecture Overview

Scalable Architecture on Amazon AWS Cloud

Diagram 1: Islands of storage across a digital broadcast workflow

NoSQL Database in the Cloud: Couchbase Server 2.0 on AWS July 2013

EMC SCALEIO OPERATION OVERVIEW

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

Active-Active and High Availability

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

Alfresco Enterprise on AWS: Reference Architecture

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Social Networks and the Richness of Data

Module 14: Scalability and High Availability

Amazon Cloud Storage Options

How to Choose your Red Hat Enterprise Linux Filesystem

Software-Defined Networks Powered by VellOS

New Features in SANsymphony -V10 Storage Virtualization Software

Reference Model for Cloud Applications CONSIDERATIONS FOR SW VENDORS BUILDING A SAAS SOLUTION

Active-Active ImageNow Server

Chapter 2 TOPOLOGY SELECTION. SYS-ED/ Computer Education Techniques, Inc.

Achieving Zero Downtime for Apps in SQL Environments

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Cloud Based Application Architectures using Smart Computing

High Availability Solutions for the MariaDB and MySQL Database

Luxembourg June

13.1 Backup virtual machines running on VMware ESXi / ESX Server

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

New Features in PSP2 for SANsymphony -V10 Software-defined Storage Platform and DataCore Virtual SAN

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Investor Newsletter. Storage Made Easy Cloud Appliance High Availability Options WHAT IS THE CLOUD APPLIANCE?

Whitepaper. NexentaConnect for VMware Virtual SAN. Full Featured File services for Virtual SAN

SAN Conceptual and Design Basics

PARALLELS CLOUD SERVER

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

Architecting for the cloud designing for scalability in cloud-based applications

In Memory Accelerator for MongoDB

TOP FIVE REASONS WHY CUSTOMERS USE EMC AND VMWARE TO VIRTUALIZE ORACLE ENVIRONMENTS

Amazon EC2 Product Details Page 1 of 5

Technical Overview Simple, Scalable, Object Storage Software

Assignment # 1 (Cloud Computing Security)

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

Best Practices for Installing and Configuring the Hyper-V Role on the LSI CTS2600 Storage System for Windows 2008

Cloud Server. Parallels. An Introduction to Operating System Virtualization and Parallels Cloud Server. White Paper.

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Helping MSPs protect Data Center resources

SCALABLE DATA SERVICES

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Automatic Service Migration in WebLogic Server An Oracle White Paper July 2008

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

RingStor User Manual. Version 2.1 Last Update on September 17th, RingStor, Inc. 197 Route 18 South, Ste 3000 East Brunswick, NJ

Microsoft Private Cloud Fast Track

Quantum StorNext. Product Brief: Distributed LAN Client

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Frequently Asked Questions

Running a Workflow on a PowerCenter Grid

Manjrasoft Market Oriented Cloud Computing Platform

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Product Brochure. Hedvig Distributed Storage Platform Modern Storage for Modern Business. Elastic. Accelerate data to value. Simple.

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Manjrasoft Market Oriented Cloud Computing Platform

HRG Assessment: Stratus everrun Enterprise

MarkLogic Server Scalability, Availability, and Failover Guide

Online Transaction Processing in SQL Server 2008

VMware vcloud Automation Center 6.1

NoSQL Databases. Nikos Parlavantzas

Realizing the True Potential of Software-Defined Storage

MICROSOFT HYPER-V SCALABILITY WITH EMC SYMMETRIX VMAX

<Insert Picture Here> Oracle In-Memory Database Cache Overview

Virtual SAN Design and Deployment Guide

ORACLE DATABASE 10G ENTERPRISE EDITION

VMware vsphere Data Protection

BASICS OF SCALING: LOAD BALANCERS

FAWN - a Fast Array of Wimpy Nodes

Welcome to the IBM Education Assistant module for Tivoli Storage Manager version 6.2 Hyper-V backups. hyper_v_backups.ppt.

Service Catalogue. virtual services, real results

Evolution of Web Application Architecture International PHP Conference. Kore Nordmann / <kore@qafoo.com> June 9th, 2015

NoSQL Data Base Basics

Transcription:

Couchbase Server Technical Overview Key concepts, system architecture and subsystem design

Table of Contents What is Couchbase Server? 3 System overview and architecture 5 Overview Couchbase Server and client software Couchbase Server in the application stack Data flow in a Couchbase Server environment Between application and Couchbase Server Within the Couchbase Server cluster Top-level software block architecture Couchbase Server Data manager 9 TCP ports Embedded Moxi Memcached protocol listener/sender Couchbase Server storage engine Couchbase Server Cluster manager 12 TCP ports REST management API Per node configuration management and monitoring functions Per cluster functions Getting started with Couchbase Server 15 Glossary 15 2

What is Couchbase Server? Couchbase Server is a simple, fast, elastic NoSQL database, optimized for the data management needs of interactive web applications. Couchbase Server makes it easy to optimally match resources to the changing needs of an application by automatically distributing data and I/O across commodity servers or virtual machines. It scales out and supports live cluster topology changes while continuing to service data operations. Its managed object caching technology delivers consistent, sub-millisecond random reads, while sustaining high-throughput writes. As a document-oriented database, Couchbase Server accommodates changing data management requirements without the burden of schema management. Key Couchbase Server characteristics and capabilities include: Push-button elasticity Add or remove multiple servers simultaneously with the push of a button Efficient data rebalancing without requiring application changes Memcached compatible Easy to get started with Couchbase drop-in replacement for memcached Simple, easy to use and widely supported key-value interface Zero-downtime maintenance Add or remove servers, upgrade software in and perform any maintenance tasks in a live cluster No application downtime required No application performance degradation Enterprise class monitoring and administration Deeply instrumented monitoring with rich administration GUI Dynamic system monitoring charts Backup and restore capability RESTful management API Easy interface to external monitoring and management systems Easy to automate deployment to the cloud Reliable low-latency storage architecture Memcached inside. Caching technology has 10 years of production maturity and powers 18 of the top 20 web applications on the planet Efficient use of memory (object-level cache prevents thrashing inherent to pagelevel approaches) Predictable low latency No memory mapped files Pull the plug on a server without fear of stored data corruption Data replication with auto-failover Maintain multiple copies of your data within the cluster for high-availability User configurable replication count User configurable failover policy to ensure data availability in the face of hardware failure 3

Professional SDKs for wide variety of languages Well-documented, easy-to-use SDKS make it easy for developers to build applications that store data in Couchbase Support for Java, C#, PHP, C, Python, Ruby At the highest level, Couchbase Server is simple, fast, elastic, and reliable. Every feature and design decision is weighed against these core principles: Simple. Everything about Couchbase Server is easy: getting, installing, managing, expanding and using it. As a document database, there is no need to create and manage schemas; and never a need to normalize, shard or tune the database. Build applications faster, keep them running reliably and easily adapt them to changing business requirements. Fast. Couchbase Server is screamingly, predictably fast. It is the lowest latency, highest throughput NoSQL database technology available. Read and write data with consistently low latency and sustained high throughput across the scaling spectrum. Get the performance you need at lower cost. Elastic. By automatically distributing data and I/O across commodity servers or virtual machines, Couchbase Server makes it easy to match the optimal quantity of resources to the changing needs of an application. Quickly grow a cluster from 1 node to 25 nodes to 100 nodes or shrink a cluster to sustain application performance, while precisely matching cost to demand. There are no single points of failure in a Couchbase cluster and all operations function across the entire cluster. Sophisticated replication and persistence subsystems guarantee continuous operations. Reliable. Couchbase Server is enterprise-ready software that you can depend on for mission critical applications. With zero-downtime maintenance and rich monitoring capabilities, deploy mission critical applications with confidence. 4

System overview and architecture Overview Couchbase Server and client software A Couchbase Server is a computer (e.g., commodity Intel server, VMware virtual machine, Amazon machine instance) running Couchbase Server software. Couchbase Server runs on 32- and 64-bit Linux, Windows and Mac operating systems. The source code is a mix of C, C++ and Erlang, with some utility functionality authored in Python. Each server in a Couchbase Server cluster runs identical Couchbase Server software, meaning all Couchbase Server nodes are created equal. A number of benefits flow from the decision to avoid special-case nodes running differentiated software or exhibiting differentiated functionality (e.g., masters, slaves, cluster managers, configuration servers): 1. No single point of failure. Nodes can fail at any time (up to the replication count of the cluster) and a Couchbase Server cluster can continue to process data operations for the entire key space of data, and with no loss of administrative functionality. If the server with the global singleton is lost (the elected leader of the cluster), the Erlang-based cluster management system will elect a new leader and cluster management operations will continue without impacting applications on top. And given the distributed architecture of Couchbase Server, even if the cluster management subsystem were to completely fail, data operations would continue uninterrupted. 2. Get started with one node. The full functionality of Couchbase Server is available with just a single package installation. Download, install and begin using Couchbase Server in five minutes or less, on just one node if desired. 3. Clone to grow. Because all nodes are alike, you can literally clone a virtual machine running Couchbase Server software, join it to a cluster (one mouse click) and rebalance the cluster (another mouse click) to migrate data to the net new server, balancing data and I/O across the cluster. You can do this with many servers at once, and the entire process can be automated through use of the Couchbase Server CLI utility or REST calls. An application interacts with a Couchbase Server cluster through a memcached client library, typically over a network connection. The client library employs an algorithm (pluggable, but a hashing algorithm is default in Couchbase Server) to calculate a virtual bucket in which a given key s value is to be located. Couchbase Server will hash a key to 1 of 1024 vbuckets. 5

The vbucket number is then used as an index by the client to lookup, in the vbucket map data structure, the individual server in the cluster responsible for the data in that vbucket (including master and replica server responsibilities). Memcached client libraries are available for practically every language and application framework. Couchbase Server in the application stack As shown in Figure 1, Couchbase Server supports a scale out architecture at the data layer. Couchbase Servers are deployed as a cluster behind web application servers, spreading the data and I/O operations evenly across the cluster. Servers can be added to, and removed from, a live cluster. This deployment model matches what is already best practice architecture at the application logic tier, where new web servers are deployed alongside existing servers and placed into rotation behind a load balancer. With Couchbase Server, client-side logic effectively load balances data operations across the cluster through a key hashing and server mapping algorithm. www.wellsfargo.com Load Balancer Web Servers Couchbase Servers Figure 1: Couchbase Server deployment architecture 6

Data flow in a Couchbase Server environment Between application and Couchbase Server Couchbase Server (memcached) client hashes KEY, identifies KEY s master server Couchbase Server replicates KEY-VALUE pair, caches it in memory and stores it to disk. Figure 2: shows the flow of data from an application to a Couchbase Server cluster, illustrating a data write operation. The illustration starts at the presentation layer: 1. An application user takes an action that results in the need to update a data item in Couchbase Server 2. The application server responding to the user action updates the key s value and makes a call to a memcached client library to set the key-value pair 3. The memcached client library selects the server currently serving as master for the referenced key and transmits the operation to the server 4. (and 5.) Upon arrival, Couchbase Server replicates, caches and stores the data, as detailed in the next section Within the Couchbase Server cluster Picking up from step 5 in figure 2, figure 3 shows the processing of the set operation inside the Couchbase Server cluster. 1. The set arrives into the Couchbase Server listener-receiver. 2. Couchbase Server immediately replicates the data to replica servers the number of replica copies is user defined. Upon arrival at replica servers, the data is persisted. 3. The data is cached in main memory. 7

4. The data is queued for persistence and de-duplicated if a write is already pending. Once the pending write is pulled from the queue, the value is retrieved from cache and written to disk (or SSD). 5. Set acknowledgment return to application. Figure 3: data flow within the Couchbase Server cluster on write Top-level software block architecture At the highest level, Couchbase Server has two distinct functional blocks: the Data Manager and the Cluster Manager. With some effort, it is possible to selectively build Couchbase Server complete devoid of a Cluster Management subsystem. Node configuration management, replication, health monitoring and other capabilities would have to be performed by an external system. Figure 4: Couchbase Server software architecture 8

Data Manager. The data manager does the work of storing and retrieving data in response to data operation requests from applications. It exposes two memcapable ports to the network one port supports non-vbucket-aware memcached client libraries (pre-memcapable 2.0 API), which are proxied if required; the other port expects to communicate with vbucket-aware clients (memcapable 2.0+ API). The majority of code in the Data Manager is C and C++. Cluster Manager. The cluster manager supervises the configuration and behavior of all nodes in a Couchbase Server cluster. Cluster management code runs on every node in the cluster, but one node (the one holding a global singleton) is elected to perform aggregation, consensus building and cross-node control decisions at any point in time. The majority of code in the Cluster Manager is written in Erlang/OTP, a language which makes writing correct concurrent code (notoriously difficult) nearly effortless. The following sections provide a high-level look at the subsystems inside the data and cluster manager systems. Couchbase Server Data manager Figure 5 below highlights the key subsystems, and their interconnections, in the data path within a Couchbase Server node. Figure 5: Couchbase Server data manager 9

TCP ports The Couchbase Server data manager listens for requests on two TCP ports (the port numbers are configurable, defaults are shown): Port 11211 The traditional memcached port number processes requests from clients supporting version 1.0 of the memcapable API specification. These clients rely on a consistent hashing algorithm to map keys directly to servers in a variable-length server list. Most memcached clients today support memcapable 1.0, though memcapable 2.0 clients for the most popular platforms are being introduced (e.g., spymemcached for Java, enyim for.net, fauna for Ruby, libmemcached for C and other languages that wrap this client library). Port 11210 a port directly accessible to clients implementing version 2.0 of the memcapable API. These clients are vbucket aware, using a hashing algorithm to map keys to one of a fixed number of vbuckets (in Couchbase Server, the key space is grouped into 1024 vbuckets). [For more information on vbuckets, see the vbuckets section later in this document]. vbuckets are then mapped to a server, providing a layer of indirection enabling dynamic cluster rebalancing, non-disruptive cluster expansion or contraction, replication, failover and a host of other capabilities. Embedded Moxi For non-vbucket-aware clients, moxi provides high-performance proxy services. When clients send operations to port 11211, moxi processes them and, if required, forwards them to the server(s) currently servicing requests for the key(s) referenced by the operation. This mapping and forwarding function is unnecessary for vbucket-aware clients. Memcached protocol listener/sender As mentioned previously, the latest stable memcached front-end source code is directly linked into Couchbase Server, guaranteeing protocol compatibility with memcached (both ASCII and binary protocols) now and in to the future. A number of capabilities are embodied within this subsystem: network listener, protocol parser, thread manager, and the tap stream sender logic. 10

Couchbase Server storage engine The Couchbase Server storage engine does the heavy lifting of caching and persisting data within a Couchbase Server node. Figure 6: Data storage hierarchy behind the Couchbase Server storage engine As shown in Figure 6, the Couchbase Server storage engine can manage a hierarchy of storage media, including main memory and spinning disk drives. Couchbase Server supports both on- and off-node storage; each node can be configured to use local storage media or to store data on an external data path, including mixing the two. Data is automatically migrated up and down the latency/cost stack (RAM-Disk) based on data access patterns (Figure 7). Figure 7: Data migrates up and down the latency stack 11

In Couchbase Server, data migration is based on an LRU algorithm, keeping recently used items in low-latency media while aging out colder items; first to SSD (if available) and then to spinning media. Alternative storage migration (and replication management, covered later) algorithms offer a rich set of community research and development opportunities. Couchbase Server Cluster manager The Couchbase Server cluster manager monitors health and coordinates data manager behavior on each node; configures and supervises inter-node behavior (e.g. replication streams and rebalancing operations); provides aggregation and consensus functions for the cluster (e.g. global singleton election); and provides a RESTful cluster management API. The cluster manager is build atop Erlang/OTP, a proven environment for building and operating robust fault-tolerant distributed applications. Figure 8: Couchbase Server cluster manager TCP ports The Couchbase Server cluster manager listens for http requests on a configurable TCP port (default is 8091) a REST API and web user interface receive and process this traffic. By default, ports 4369 and a range from 21100-21199 are dedicated to Erlang/OTP functions. The erlang port mapper runs on 4369 and inter-erlang-node communications operate in the 211xx range. 12

REST management API This port services cluster management requests via a published RESTful API. A CLI utility that leverages the REST interface provides a convenient way to programmatically manage a Couchbase Server cluster. Figure 9 summarizes the capabilities of the Couchbase Server CLI (and the underlying REST API). Figure 9: CLI utility uses Couchbase Server REST interface 13

Per node configuration management and monitoring functions The Couchbase Server cluster manager executes on each node in a Couchbase Server cluster. There are four primary subsystems that operate on each node. 1. Heartbeat. A watchdog process periodically communicates with the currently elected cluster leader (the node with the global singleton) to provide Couchbase Server health updates. 2. Process monitor. This subsystem monitors execution of the local data manager, restarting failed processes as required and contributing status information to the heartbeat module. 3. Configuration Manager. Each Couchbase Server node has a configuration a vbucket map, active replication streams, a target rebalance map, etc. The configuration manager receives, processes and monitors local configuration, in concert with a cluster-wide configuration distribution system. 4. Global Singleton Supervisor. In a Couchbase Server cluster, one node is elected leader. If the leader dies, a new leader is elected. The Global Singleton Supervisor is responsible for electing a cluster leader and supervising per-cluster processes if the local node is the current leader. Per cluster functions In addition to the per-node functions which are always executing at each node in a Couchbase Server cluster, there are a set of functions which active only on one node in the cluster at any point in time. Possession of a global singleton data structure indicates to a node that it should execute these functions. 1. Rebalance Orchestrator. The rebalance orchestrator calculates, distributes and provides cluster-wide supervision of a rebalance operation. When a rebalance operation is initiated, it calculates a target vbucket map based on the current pending set of servers to be added and removed from the cluster; distributes commands to individual nodes to build a network of vbucket migration streams; and monitors migration completion events, updating and distributing the current vbucket map as migrations complete (note: there is a companion white paper that details the operation of the Couchbase Server rebalance orchestrator). 2. Node Health Monitor. The node health monitor (also known as The Doctor) receives heartbeat updates from individual nodes in the cluster, updating configuration and raising alerts as required. 3. vbucket state and replication manager. Responsible for establishing and monitoring the current network of replication streams. 14

Getting started with Couchbase Server Couchbase Server is freely available in both binary and source form. Downloading, installing and configuring Couchbase Server takes less than five minutes. This paper outlined the internal workings of Couchbase Server; but experiencing the simple, fast and elastic properties of Couchbase Server first-hand is the only way to really get a feel for the technology and how it may be useful in your application development environment. To download Couchbase Server, go to http://www.couchbase.com/downloads. Glossary Bucket: A Bucket is a Couchbase Server data partition with its own keyspace. Each Bucket therefore has its own vbucket map. Couchbase Server allows multiple buckets to exist on a single Couchbase Server cluster providing secure multi-tenancy and separation of data sets. Each bucket can have its own properties and settings (e.g., replication count, blocking behavior, and cache and storage quotas). In most cases, a bucket can be thought of as a virtual Couchbase Server cluster. Cache: The caching layer in Couchbase Server is derived from the Memcached open source project. The Couchbase Server Cache transparently provides in-memory caching services to any application interacting with Couchbase Server. Couchbase Server: A distributed database management system optimized for storing data behind interactive web applications. Couchbase Server Cluster Manager: A Couchbase Server module (written in erlang) which provides a number of cluster-wide services, such as consensus formation, configuration management/distribution, and rebalance orchestration. To maximize performance, the cluster manager is never in the data flow path for any data operation (including replication and rebalancing streams). It is responsible only for configuring and coordinating the interaction between servers in a Couchbase Server cluster. Current vbucket Map: A table identifying the active Master and Replica Servers for each vbucket. During a rebalance operation, this map is updated by the Rebalance Orchestrator as individual vbucket migrations complete. 15

Failover: If a server in a Couchbase Server cluster fails, the Failover mechanism can rapidly (< 100 msec) transfer Master Server status for all vbuckets previously mastered on that server to servers which have replica copies of those vbuckets. This operation leaves the cluster with one less replica copy of any data object which was stored (either in master or replica form) on the failed-over server. Failover ensures all objects stored in Couchbase Server are quickly available to an application for reading and writing, following failure of a server (because only one server can service reads and writes for any given vbucket, at any point in time). After initiating a failover, a Couchbase Server cluster administrator will typically repair, add or remove servers, then rebalance the cluster to restore a full set of replica copies. Master Migration Tap Stream: A special type of tap stream that copies all data objects in a given vbucket to the server which requested the tap stream. The special behavior happens at the end of the iteration process and pro vides a rapid, but orderly, transfer of Master Server status while maintaining data consistency. Master Server: Each vbucket has one active Master Server at any point in time. The Master Server for a given vbucket is the only server that will accept reads and writes for keys that map to that vbucket. Migrate: To transfer Master Server or Replica Server status for a given vbucket (along with all the data associated with that vbucket) from one server to another. Migration Command: A request which can be sent to a Couchbase Server cluster member by the Rebalance Orchestrator, asking for specific actions in support of the rebalancing process. These commands can be used to establish Migration Tap Streams, to purge data associated with a given vbucket, or to order a Server to cease serving as a Master or Replica Server for a given vbucket. Node: A single server in a Couchbase Server cluster. Node vbucket Master List: Each server in a Couchbase Server cluster has a Node vbucket Master List, identifying the vbuckets for which it is currently acting as Master Server. Pending Set: The list of all servers which are to be added to, or removed from, the Couchbase Server cluster during the next rebalance operation. When administrators add servers to a Couchbase Server cluster, whether through the graphical or a programmatic interface, those new servers enter in a pending add state; when administrators remove servers from the Couchbase Server cluster, they enter a pending removal state. On the next Rebalance operation, the Rebalance Orchestrator places vbucket data on the pending add servers while removing it from the pending removal servers. 16

Persistence: Storing data in a technology that enables retrieval even in the case of complete data center power loss. Couchbase Server has a multi-tier persistence model data can be stored in SSD devices or on spinning disk media, with auto-migration of the data to the lowest-latency device available, based on data access patterns. Couchbase Server uses a LRU model which migrates data based on temporal access patterns. Rebalance: The systematic process of redistributing data within a live cluster. In Couchbase Server, the Rebalance Orchestrator rebalances by selecting and then migrating certain vbuckets, including the data objects belonging to that vbucket, from old (Current) to new (Target) servers. Rebalancing will move both Master and Replica copies of objects. The intent is to spread the data, and in particular I/O requests, evenly across the cluster. Rebalancing is typically done following the removal or addition of servers to a cluster. A Couchbase Server rebalance operation can be stopped and restarted any time. Rebalance Calculator: Logic in the Couchbase Server Cluster Manager subsystem which calculates a Target vbucket Map. It takes as input the Current vbucket Map and the Pending Set. It calculates the optimal placement of vbuckets and returns the Target vbucket Map. Rebalance Orchestrator: Logic within the Couchbase Server Cluster Manager (executed on the Node with the global singleton) which coordinates a Rebalancing process (primarily by issuing Migration Commands to individual servers in the cluster). Replica Migration Tap Stream: A special type of tap stream that copies all data objects in a given vbucket to the server which requested the tap stream. The special behavior happens at the end of the iteration process and provides a rapid, but orderly, transfer of Replica Server status while maintaining data consistency. Replica Server: Couchbase Server replicates object data (the number of Replicants is user-defined) to Replica Servers. Replica Servers can rapidly (within 100 msec) become the Master Server for a given key in case of original Master Server failure. Replicant: A replica (backup) copy of an object stored in Couchbase Server. Replication: The process of storing multiple copies of an object, across different servers, facilitating high-availability of any object stored in the cluster. Specifically, Replication supports rapid accessibility of an object, via the Couchbase Server Failover mechanism. Couchbase Server supports both Master-Slave and Peer-to-Peer replication topologies. 17

Tap Stream: A publish-and-subscribe mechanism allowing a subscribing server to request copies of all data objects associated with one or more vbuckets on the publishing server. There are a number of Tap Stream types allowing only subsets of the data to be streamed, based on time and other selection filters. Tap Streams are a core building block of Couchbase Server replication and dynamic cluster rebalancing. Target vbucket Map: The vbucket Map that represents the state a cluster will be in once a currently running rebalance operation completes. The Rebalance Orchestrator compares the target and current maps to determine which Migration Tap Streams to create and supervise. The rebalance operation is complete when the Current and Target vbucket Maps are identical. vbucket: A vbucket is the owner of a subset of the key space of a Couchbase Server cluster. Every key is contained within a vbucket. A mapping function is used to calculate the vbucket in which a given key belongs. In Couchbase Server the mapping function is a hash function that takes a key as input and outputs a vbucket identifier. vbucket Map: A table identifying the servers acting as Master and Replica Servers for each vbucket. A server appearing in this table can be (and usually is) responsible for multiple vbuckets. The number of vbuckets in a Couchbase Server cluster must exceed the number of physical servers that may eventually be present in the cluster. In Couchbase Server, the vbucket map supports up to 1024 servers per cluster. See also Current vbucket Map and Target vbucket Map. 18