Couchbase Server Technical Overview. Key concepts, system architecture and subsystem design
|
|
- Hollie Hensley
- 8 years ago
- Views:
Transcription
1 Couchbase Server Technical Overview Key concepts, system architecture and subsystem design
2 Table of Contents What is Couchbase Server? 3 System overview and architecture 5 Overview Couchbase Server and client software Couchbase Server in the application stack Data flow in a Couchbase Server environment Between application and Couchbase Server Within the Couchbase Server cluster Top-level software block architecture Couchbase Server Data manager 9 TCP ports Embedded Moxi Memcached protocol listener/sender Couchbase Server storage engine Couchbase Server Cluster manager 12 TCP ports REST management API Per node configuration management and monitoring functions Per cluster functions Getting started with Couchbase Server 15 Glossary 15 2
3 What is Couchbase Server? Couchbase Server is a simple, fast, elastic NoSQL database, optimized for the data management needs of interactive web applications. Couchbase Server makes it easy to optimally match resources to the changing needs of an application by automatically distributing data and I/O across commodity servers or virtual machines. It scales out and supports live cluster topology changes while continuing to service data operations. Its managed object caching technology delivers consistent, sub-millisecond random reads, while sustaining high-throughput writes. As a document-oriented database, Couchbase Server accommodates changing data management requirements without the burden of schema management. Key Couchbase Server characteristics and capabilities include: Push-button elasticity Add or remove multiple servers simultaneously with the push of a button Efficient data rebalancing without requiring application changes Memcached compatible Easy to get started with Couchbase drop-in replacement for memcached Simple, easy to use and widely supported key-value interface Zero-downtime maintenance Add or remove servers, upgrade software in and perform any maintenance tasks in a live cluster No application downtime required No application performance degradation Enterprise class monitoring and administration Deeply instrumented monitoring with rich administration GUI Dynamic system monitoring charts Backup and restore capability RESTful management API Easy interface to external monitoring and management systems Easy to automate deployment to the cloud Reliable low-latency storage architecture Memcached inside. Caching technology has 10 years of production maturity and powers 18 of the top 20 web applications on the planet Efficient use of memory (object-level cache prevents thrashing inherent to pagelevel approaches) Predictable low latency No memory mapped files Pull the plug on a server without fear of stored data corruption Data replication with auto-failover Maintain multiple copies of your data within the cluster for high-availability User configurable replication count User configurable failover policy to ensure data availability in the face of hardware failure 3
4 Professional SDKs for wide variety of languages Well-documented, easy-to-use SDKS make it easy for developers to build applications that store data in Couchbase Support for Java, C#, PHP, C, Python, Ruby At the highest level, Couchbase Server is simple, fast, elastic, and reliable. Every feature and design decision is weighed against these core principles: Simple. Everything about Couchbase Server is easy: getting, installing, managing, expanding and using it. As a document database, there is no need to create and manage schemas; and never a need to normalize, shard or tune the database. Build applications faster, keep them running reliably and easily adapt them to changing business requirements. Fast. Couchbase Server is screamingly, predictably fast. It is the lowest latency, highest throughput NoSQL database technology available. Read and write data with consistently low latency and sustained high throughput across the scaling spectrum. Get the performance you need at lower cost. Elastic. By automatically distributing data and I/O across commodity servers or virtual machines, Couchbase Server makes it easy to match the optimal quantity of resources to the changing needs of an application. Quickly grow a cluster from 1 node to 25 nodes to 100 nodes or shrink a cluster to sustain application performance, while precisely matching cost to demand. There are no single points of failure in a Couchbase cluster and all operations function across the entire cluster. Sophisticated replication and persistence subsystems guarantee continuous operations. Reliable. Couchbase Server is enterprise-ready software that you can depend on for mission critical applications. With zero-downtime maintenance and rich monitoring capabilities, deploy mission critical applications with confidence. 4
5 System overview and architecture Overview Couchbase Server and client software A Couchbase Server is a computer (e.g., commodity Intel server, VMware virtual machine, Amazon machine instance) running Couchbase Server software. Couchbase Server runs on 32- and 64-bit Linux, Windows and Mac operating systems. The source code is a mix of C, C++ and Erlang, with some utility functionality authored in Python. Each server in a Couchbase Server cluster runs identical Couchbase Server software, meaning all Couchbase Server nodes are created equal. A number of benefits flow from the decision to avoid special-case nodes running differentiated software or exhibiting differentiated functionality (e.g., masters, slaves, cluster managers, configuration servers): 1. No single point of failure. Nodes can fail at any time (up to the replication count of the cluster) and a Couchbase Server cluster can continue to process data operations for the entire key space of data, and with no loss of administrative functionality. If the server with the global singleton is lost (the elected leader of the cluster), the Erlang-based cluster management system will elect a new leader and cluster management operations will continue without impacting applications on top. And given the distributed architecture of Couchbase Server, even if the cluster management subsystem were to completely fail, data operations would continue uninterrupted. 2. Get started with one node. The full functionality of Couchbase Server is available with just a single package installation. Download, install and begin using Couchbase Server in five minutes or less, on just one node if desired. 3. Clone to grow. Because all nodes are alike, you can literally clone a virtual machine running Couchbase Server software, join it to a cluster (one mouse click) and rebalance the cluster (another mouse click) to migrate data to the net new server, balancing data and I/O across the cluster. You can do this with many servers at once, and the entire process can be automated through use of the Couchbase Server CLI utility or REST calls. An application interacts with a Couchbase Server cluster through a memcached client library, typically over a network connection. The client library employs an algorithm (pluggable, but a hashing algorithm is default in Couchbase Server) to calculate a virtual bucket in which a given key s value is to be located. Couchbase Server will hash a key to 1 of 1024 vbuckets. 5
6 The vbucket number is then used as an index by the client to lookup, in the vbucket map data structure, the individual server in the cluster responsible for the data in that vbucket (including master and replica server responsibilities). Memcached client libraries are available for practically every language and application framework. Couchbase Server in the application stack As shown in Figure 1, Couchbase Server supports a scale out architecture at the data layer. Couchbase Servers are deployed as a cluster behind web application servers, spreading the data and I/O operations evenly across the cluster. Servers can be added to, and removed from, a live cluster. This deployment model matches what is already best practice architecture at the application logic tier, where new web servers are deployed alongside existing servers and placed into rotation behind a load balancer. With Couchbase Server, client-side logic effectively load balances data operations across the cluster through a key hashing and server mapping algorithm. Load Balancer Web Servers Couchbase Servers Figure 1: Couchbase Server deployment architecture 6
7 Data flow in a Couchbase Server environment Between application and Couchbase Server Couchbase Server (memcached) client hashes KEY, identifies KEY s master server Couchbase Server replicates KEY-VALUE pair, caches it in memory and stores it to disk. Figure 2: shows the flow of data from an application to a Couchbase Server cluster, illustrating a data write operation. The illustration starts at the presentation layer: 1. An application user takes an action that results in the need to update a data item in Couchbase Server 2. The application server responding to the user action updates the key s value and makes a call to a memcached client library to set the key-value pair 3. The memcached client library selects the server currently serving as master for the referenced key and transmits the operation to the server 4. (and 5.) Upon arrival, Couchbase Server replicates, caches and stores the data, as detailed in the next section Within the Couchbase Server cluster Picking up from step 5 in figure 2, figure 3 shows the processing of the set operation inside the Couchbase Server cluster. 1. The set arrives into the Couchbase Server listener-receiver. 2. Couchbase Server immediately replicates the data to replica servers the number of replica copies is user defined. Upon arrival at replica servers, the data is persisted. 3. The data is cached in main memory. 7
8 4. The data is queued for persistence and de-duplicated if a write is already pending. Once the pending write is pulled from the queue, the value is retrieved from cache and written to disk (or SSD). 5. Set acknowledgment return to application. Figure 3: data flow within the Couchbase Server cluster on write Top-level software block architecture At the highest level, Couchbase Server has two distinct functional blocks: the Data Manager and the Cluster Manager. With some effort, it is possible to selectively build Couchbase Server complete devoid of a Cluster Management subsystem. Node configuration management, replication, health monitoring and other capabilities would have to be performed by an external system. Figure 4: Couchbase Server software architecture 8
9 Data Manager. The data manager does the work of storing and retrieving data in response to data operation requests from applications. It exposes two memcapable ports to the network one port supports non-vbucket-aware memcached client libraries (pre-memcapable 2.0 API), which are proxied if required; the other port expects to communicate with vbucket-aware clients (memcapable 2.0+ API). The majority of code in the Data Manager is C and C++. Cluster Manager. The cluster manager supervises the configuration and behavior of all nodes in a Couchbase Server cluster. Cluster management code runs on every node in the cluster, but one node (the one holding a global singleton) is elected to perform aggregation, consensus building and cross-node control decisions at any point in time. The majority of code in the Cluster Manager is written in Erlang/OTP, a language which makes writing correct concurrent code (notoriously difficult) nearly effortless. The following sections provide a high-level look at the subsystems inside the data and cluster manager systems. Couchbase Server Data manager Figure 5 below highlights the key subsystems, and their interconnections, in the data path within a Couchbase Server node. Figure 5: Couchbase Server data manager 9
10 TCP ports The Couchbase Server data manager listens for requests on two TCP ports (the port numbers are configurable, defaults are shown): Port The traditional memcached port number processes requests from clients supporting version 1.0 of the memcapable API specification. These clients rely on a consistent hashing algorithm to map keys directly to servers in a variable-length server list. Most memcached clients today support memcapable 1.0, though memcapable 2.0 clients for the most popular platforms are being introduced (e.g., spymemcached for Java, enyim for.net, fauna for Ruby, libmemcached for C and other languages that wrap this client library). Port a port directly accessible to clients implementing version 2.0 of the memcapable API. These clients are vbucket aware, using a hashing algorithm to map keys to one of a fixed number of vbuckets (in Couchbase Server, the key space is grouped into 1024 vbuckets). [For more information on vbuckets, see the vbuckets section later in this document]. vbuckets are then mapped to a server, providing a layer of indirection enabling dynamic cluster rebalancing, non-disruptive cluster expansion or contraction, replication, failover and a host of other capabilities. Embedded Moxi For non-vbucket-aware clients, moxi provides high-performance proxy services. When clients send operations to port 11211, moxi processes them and, if required, forwards them to the server(s) currently servicing requests for the key(s) referenced by the operation. This mapping and forwarding function is unnecessary for vbucket-aware clients. Memcached protocol listener/sender As mentioned previously, the latest stable memcached front-end source code is directly linked into Couchbase Server, guaranteeing protocol compatibility with memcached (both ASCII and binary protocols) now and in to the future. A number of capabilities are embodied within this subsystem: network listener, protocol parser, thread manager, and the tap stream sender logic. 10
11 Couchbase Server storage engine The Couchbase Server storage engine does the heavy lifting of caching and persisting data within a Couchbase Server node. Figure 6: Data storage hierarchy behind the Couchbase Server storage engine As shown in Figure 6, the Couchbase Server storage engine can manage a hierarchy of storage media, including main memory and spinning disk drives. Couchbase Server supports both on- and off-node storage; each node can be configured to use local storage media or to store data on an external data path, including mixing the two. Data is automatically migrated up and down the latency/cost stack (RAM-Disk) based on data access patterns (Figure 7). Figure 7: Data migrates up and down the latency stack 11
12 In Couchbase Server, data migration is based on an LRU algorithm, keeping recently used items in low-latency media while aging out colder items; first to SSD (if available) and then to spinning media. Alternative storage migration (and replication management, covered later) algorithms offer a rich set of community research and development opportunities. Couchbase Server Cluster manager The Couchbase Server cluster manager monitors health and coordinates data manager behavior on each node; configures and supervises inter-node behavior (e.g. replication streams and rebalancing operations); provides aggregation and consensus functions for the cluster (e.g. global singleton election); and provides a RESTful cluster management API. The cluster manager is build atop Erlang/OTP, a proven environment for building and operating robust fault-tolerant distributed applications. Figure 8: Couchbase Server cluster manager TCP ports The Couchbase Server cluster manager listens for http requests on a configurable TCP port (default is 8091) a REST API and web user interface receive and process this traffic. By default, ports 4369 and a range from are dedicated to Erlang/OTP functions. The erlang port mapper runs on 4369 and inter-erlang-node communications operate in the 211xx range. 12
13 REST management API This port services cluster management requests via a published RESTful API. A CLI utility that leverages the REST interface provides a convenient way to programmatically manage a Couchbase Server cluster. Figure 9 summarizes the capabilities of the Couchbase Server CLI (and the underlying REST API). Figure 9: CLI utility uses Couchbase Server REST interface 13
14 Per node configuration management and monitoring functions The Couchbase Server cluster manager executes on each node in a Couchbase Server cluster. There are four primary subsystems that operate on each node. 1. Heartbeat. A watchdog process periodically communicates with the currently elected cluster leader (the node with the global singleton) to provide Couchbase Server health updates. 2. Process monitor. This subsystem monitors execution of the local data manager, restarting failed processes as required and contributing status information to the heartbeat module. 3. Configuration Manager. Each Couchbase Server node has a configuration a vbucket map, active replication streams, a target rebalance map, etc. The configuration manager receives, processes and monitors local configuration, in concert with a cluster-wide configuration distribution system. 4. Global Singleton Supervisor. In a Couchbase Server cluster, one node is elected leader. If the leader dies, a new leader is elected. The Global Singleton Supervisor is responsible for electing a cluster leader and supervising per-cluster processes if the local node is the current leader. Per cluster functions In addition to the per-node functions which are always executing at each node in a Couchbase Server cluster, there are a set of functions which active only on one node in the cluster at any point in time. Possession of a global singleton data structure indicates to a node that it should execute these functions. 1. Rebalance Orchestrator. The rebalance orchestrator calculates, distributes and provides cluster-wide supervision of a rebalance operation. When a rebalance operation is initiated, it calculates a target vbucket map based on the current pending set of servers to be added and removed from the cluster; distributes commands to individual nodes to build a network of vbucket migration streams; and monitors migration completion events, updating and distributing the current vbucket map as migrations complete (note: there is a companion white paper that details the operation of the Couchbase Server rebalance orchestrator). 2. Node Health Monitor. The node health monitor (also known as The Doctor) receives heartbeat updates from individual nodes in the cluster, updating configuration and raising alerts as required. 3. vbucket state and replication manager. Responsible for establishing and monitoring the current network of replication streams. 14
15 Getting started with Couchbase Server Couchbase Server is freely available in both binary and source form. Downloading, installing and configuring Couchbase Server takes less than five minutes. This paper outlined the internal workings of Couchbase Server; but experiencing the simple, fast and elastic properties of Couchbase Server first-hand is the only way to really get a feel for the technology and how it may be useful in your application development environment. To download Couchbase Server, go to Glossary Bucket: A Bucket is a Couchbase Server data partition with its own keyspace. Each Bucket therefore has its own vbucket map. Couchbase Server allows multiple buckets to exist on a single Couchbase Server cluster providing secure multi-tenancy and separation of data sets. Each bucket can have its own properties and settings (e.g., replication count, blocking behavior, and cache and storage quotas). In most cases, a bucket can be thought of as a virtual Couchbase Server cluster. Cache: The caching layer in Couchbase Server is derived from the Memcached open source project. The Couchbase Server Cache transparently provides in-memory caching services to any application interacting with Couchbase Server. Couchbase Server: A distributed database management system optimized for storing data behind interactive web applications. Couchbase Server Cluster Manager: A Couchbase Server module (written in erlang) which provides a number of cluster-wide services, such as consensus formation, configuration management/distribution, and rebalance orchestration. To maximize performance, the cluster manager is never in the data flow path for any data operation (including replication and rebalancing streams). It is responsible only for configuring and coordinating the interaction between servers in a Couchbase Server cluster. Current vbucket Map: A table identifying the active Master and Replica Servers for each vbucket. During a rebalance operation, this map is updated by the Rebalance Orchestrator as individual vbucket migrations complete. 15
16 Failover: If a server in a Couchbase Server cluster fails, the Failover mechanism can rapidly (< 100 msec) transfer Master Server status for all vbuckets previously mastered on that server to servers which have replica copies of those vbuckets. This operation leaves the cluster with one less replica copy of any data object which was stored (either in master or replica form) on the failed-over server. Failover ensures all objects stored in Couchbase Server are quickly available to an application for reading and writing, following failure of a server (because only one server can service reads and writes for any given vbucket, at any point in time). After initiating a failover, a Couchbase Server cluster administrator will typically repair, add or remove servers, then rebalance the cluster to restore a full set of replica copies. Master Migration Tap Stream: A special type of tap stream that copies all data objects in a given vbucket to the server which requested the tap stream. The special behavior happens at the end of the iteration process and pro vides a rapid, but orderly, transfer of Master Server status while maintaining data consistency. Master Server: Each vbucket has one active Master Server at any point in time. The Master Server for a given vbucket is the only server that will accept reads and writes for keys that map to that vbucket. Migrate: To transfer Master Server or Replica Server status for a given vbucket (along with all the data associated with that vbucket) from one server to another. Migration Command: A request which can be sent to a Couchbase Server cluster member by the Rebalance Orchestrator, asking for specific actions in support of the rebalancing process. These commands can be used to establish Migration Tap Streams, to purge data associated with a given vbucket, or to order a Server to cease serving as a Master or Replica Server for a given vbucket. Node: A single server in a Couchbase Server cluster. Node vbucket Master List: Each server in a Couchbase Server cluster has a Node vbucket Master List, identifying the vbuckets for which it is currently acting as Master Server. Pending Set: The list of all servers which are to be added to, or removed from, the Couchbase Server cluster during the next rebalance operation. When administrators add servers to a Couchbase Server cluster, whether through the graphical or a programmatic interface, those new servers enter in a pending add state; when administrators remove servers from the Couchbase Server cluster, they enter a pending removal state. On the next Rebalance operation, the Rebalance Orchestrator places vbucket data on the pending add servers while removing it from the pending removal servers. 16
17 Persistence: Storing data in a technology that enables retrieval even in the case of complete data center power loss. Couchbase Server has a multi-tier persistence model data can be stored in SSD devices or on spinning disk media, with auto-migration of the data to the lowest-latency device available, based on data access patterns. Couchbase Server uses a LRU model which migrates data based on temporal access patterns. Rebalance: The systematic process of redistributing data within a live cluster. In Couchbase Server, the Rebalance Orchestrator rebalances by selecting and then migrating certain vbuckets, including the data objects belonging to that vbucket, from old (Current) to new (Target) servers. Rebalancing will move both Master and Replica copies of objects. The intent is to spread the data, and in particular I/O requests, evenly across the cluster. Rebalancing is typically done following the removal or addition of servers to a cluster. A Couchbase Server rebalance operation can be stopped and restarted any time. Rebalance Calculator: Logic in the Couchbase Server Cluster Manager subsystem which calculates a Target vbucket Map. It takes as input the Current vbucket Map and the Pending Set. It calculates the optimal placement of vbuckets and returns the Target vbucket Map. Rebalance Orchestrator: Logic within the Couchbase Server Cluster Manager (executed on the Node with the global singleton) which coordinates a Rebalancing process (primarily by issuing Migration Commands to individual servers in the cluster). Replica Migration Tap Stream: A special type of tap stream that copies all data objects in a given vbucket to the server which requested the tap stream. The special behavior happens at the end of the iteration process and provides a rapid, but orderly, transfer of Replica Server status while maintaining data consistency. Replica Server: Couchbase Server replicates object data (the number of Replicants is user-defined) to Replica Servers. Replica Servers can rapidly (within 100 msec) become the Master Server for a given key in case of original Master Server failure. Replicant: A replica (backup) copy of an object stored in Couchbase Server. Replication: The process of storing multiple copies of an object, across different servers, facilitating high-availability of any object stored in the cluster. Specifically, Replication supports rapid accessibility of an object, via the Couchbase Server Failover mechanism. Couchbase Server supports both Master-Slave and Peer-to-Peer replication topologies. 17
18 Tap Stream: A publish-and-subscribe mechanism allowing a subscribing server to request copies of all data objects associated with one or more vbuckets on the publishing server. There are a number of Tap Stream types allowing only subsets of the data to be streamed, based on time and other selection filters. Tap Streams are a core building block of Couchbase Server replication and dynamic cluster rebalancing. Target vbucket Map: The vbucket Map that represents the state a cluster will be in once a currently running rebalance operation completes. The Rebalance Orchestrator compares the target and current maps to determine which Migration Tap Streams to create and supervise. The rebalance operation is complete when the Current and Target vbucket Maps are identical. vbucket: A vbucket is the owner of a subset of the key space of a Couchbase Server cluster. Every key is contained within a vbucket. A mapping function is used to calculate the vbucket in which a given key belongs. In Couchbase Server the mapping function is a hash function that takes a key as input and outputs a vbucket identifier. vbucket Map: A table identifying the servers acting as Master and Replica Servers for each vbucket. A server appearing in this table can be (and usually is) responsible for multiple vbuckets. The number of vbuckets in a Couchbase Server cluster must exceed the number of physical servers that may eventually be present in the cluster. In Couchbase Server, the vbucket map supports up to 1024 servers per cluster. See also Current vbucket Map and Target vbucket Map. 18
Couchbase Server Under the Hood
Couchbase Server Under the Hood An Architectural Overview Couchbase Server is an open-source distributed NoSQL document-oriented database for interactive applications, uniquely suited for those needing
More informationmembase.org: The Simple, Fast, Elastic NoSQL Database NorthScale Matt Ingenthron OSCON 2010
membase.org: The Simple, Fast, Elastic NoSQL Database NorthScale Matt Ingenthron OSCON 2010 Membase is an Open Source distributed, key-value database management system optimized for storing data behind
More informationOn- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...
More informationEnabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings
Solution Brief Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings Introduction Accelerating time to market, increasing IT agility to enable business strategies, and improving
More informationTop 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL
SCHOONER WHITE PAPER Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL About Schooner Information Technology Schooner Information Technology provides
More information<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store
Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb, Consulting MTS The following is intended to outline our general product direction. It is intended for information
More informationUsing MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A
More informationApache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
More informationAvailability Digest. www.availabilitydigest.com. Redundant Load Balancing for High Availability July 2013
the Availability Digest Redundant Load Balancing for High Availability July 2013 A large data center can comprise hundreds or thousands of servers. These servers must not only be interconnected, but they
More informationScala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
More informationBenchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk
Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria
More informationMyISAM Default Storage Engine before MySQL 5.5 Table level locking Small footprint on disk Read Only during backups GIS and FTS indexing Copyright 2014, Oracle and/or its affiliates. All rights reserved.
More informationHigh Availability with Windows Server 2012 Release Candidate
High Availability with Windows Server 2012 Release Candidate Windows Server 2012 Release Candidate (RC) delivers innovative new capabilities that enable you to build dynamic storage and availability solutions
More informationFault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together
Fault-Tolerant Computer System Design ECE 695/CS 590 Putting it All Together Saurabh Bagchi ECE/CS Purdue University ECE 695/CS 590 1 Outline Looking at some practical systems that integrate multiple techniques
More informationHDFS Architecture Guide
by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5
More informationThe Sierra Clustered Database Engine, the technology at the heart of
A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel
More informationHypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
More informationScalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
More informationDiagram 1: Islands of storage across a digital broadcast workflow
XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,
More informationNoSQL Database in the Cloud: Couchbase Server 2.0 on AWS July 2013
NoSQL Database in the Cloud: Couchbase Server 2.0 on AWS July 2013 Kyle Lichtenberg and Miles Ward (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this whitepaper.) Page 1
More informationEMC SCALEIO OPERATION OVERVIEW
EMC SCALEIO OPERATION OVERVIEW Ensuring Non-disruptive Operation and Upgrade ABSTRACT This white paper reviews the challenges organizations face as they deal with the growing need for always-on levels
More informationElastic Application Platform for Market Data Real-Time Analytics. for E-Commerce
Elastic Application Platform for Market Data Real-Time Analytics Can you deliver real-time pricing, on high-speed market data, for real-time critical for E-Commerce decisions? Market Data Analytics applications
More informationActive-Active and High Availability
Active-Active and High Availability Advanced Design and Setup Guide Perceptive Content Version: 7.0.x Written by: Product Knowledge, R&D Date: July 2015 2015 Perceptive Software. All rights reserved. Lexmark
More informationBASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS
WHITEPAPER BASHO DATA PLATFORM BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS INTRODUCTION Big Data applications and the Internet of Things (IoT) are changing and often improving our
More informationAlfresco Enterprise on AWS: Reference Architecture
Alfresco Enterprise on AWS: Reference Architecture October 2013 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1 of 13 Abstract Amazon Web Services (AWS)
More informationEWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications
ECE6102 Dependable Distribute Systems, Fall2010 EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications Deepal Jayasinghe, Hyojun Kim, Mohammad M. Hossain, Ali Payani
More informationComparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &
More informationSocial Networks and the Richness of Data
Social Networks and the Richness of Data Getting distributed Webservices Done with NoSQL Fabrizio Schmidt, Lars George VZnet Netzwerke Ltd. Content Unique Challenges System Evolution Architecture Activity
More informationModule 14: Scalability and High Availability
Module 14: Scalability and High Availability Overview Key high availability features available in Oracle and SQL Server Key scalability features available in Oracle and SQL Server High Availability High
More informationAmazon Cloud Storage Options
Amazon Cloud Storage Options Table of Contents 1. Overview of AWS Storage Options 02 2. Why you should use the AWS Storage 02 3. How to get Data into the AWS.03 4. Types of AWS Storage Options.03 5. Object
More informationHow to Choose your Red Hat Enterprise Linux Filesystem
How to Choose your Red Hat Enterprise Linux Filesystem EXECUTIVE SUMMARY Choosing the Red Hat Enterprise Linux filesystem that is appropriate for your application is often a non-trivial decision due to
More informationSoftware-Defined Networks Powered by VellOS
WHITE PAPER Software-Defined Networks Powered by VellOS Agile, Flexible Networking for Distributed Applications Vello s SDN enables a low-latency, programmable solution resulting in a faster and more flexible
More informationNew Features in SANsymphony -V10 Storage Virtualization Software
New Features in SANsymphony -V10 Storage Virtualization Software Updated: May 28, 2014 Contents Introduction... 1 Virtual SAN Configurations (Pooling Direct-attached Storage on hosts)... 1 Scalability
More informationReference Model for Cloud Applications CONSIDERATIONS FOR SW VENDORS BUILDING A SAAS SOLUTION
October 2013 Daitan White Paper Reference Model for Cloud Applications CONSIDERATIONS FOR SW VENDORS BUILDING A SAAS SOLUTION Highly Reliable Software Development Services http://www.daitangroup.com Cloud
More informationActive-Active ImageNow Server
Active-Active ImageNow Server Getting Started Guide ImageNow Version: 6.7. x Written by: Product Documentation, R&D Date: March 2014 2014 Perceptive Software. All rights reserved CaptureNow, ImageNow,
More informationChapter 2 TOPOLOGY SELECTION. SYS-ED/ Computer Education Techniques, Inc.
Chapter 2 TOPOLOGY SELECTION SYS-ED/ Computer Education Techniques, Inc. Objectives You will learn: Topology selection criteria. Perform a comparison of topology selection criteria. WebSphere component
More informationAchieving Zero Downtime for Apps in SQL Environments
White Paper Achieving Zero Downtime for Apps in SQL Environments 2015 ScaleArc. All Rights Reserved. Introduction Whether unplanned or planned, downtime disrupts business continuity. The cost of downtime
More informationAmazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida
Amazon Web Services Primer William Strickland COP 6938 Fall 2012 University of Central Florida AWS Overview Amazon Web Services (AWS) is a collection of varying remote computing provided by Amazon.com.
More informationAccelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
More informationScalability of web applications. CSCI 470: Web Science Keith Vertanen
Scalability of web applications CSCI 470: Web Science Keith Vertanen Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing Approaches
More informationCloud Based Application Architectures using Smart Computing
Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products
More informationHigh Availability Solutions for the MariaDB and MySQL Database
High Availability Solutions for the MariaDB and MySQL Database 1 Introduction This paper introduces recommendations and some of the solutions used to create an availability or high availability environment
More informationLuxembourg June 3 2014
Luxembourg June 3 2014 Said BOUKHIZOU Technical Manager m +33 680 647 866 sboukhizou@datacore.com SOFTWARE-DEFINED STORAGE IN ACTION What s new in SANsymphony-V 10 2 Storage Market in Midst of Disruption
More information13.1 Backup virtual machines running on VMware ESXi / ESX Server
13 Backup / Restore VMware Virtual Machines Tomahawk Pro This chapter describes how to backup and restore virtual machines running on VMware ESX, ESXi Server or VMware Server 2.0. 13.1 Backup virtual machines
More information1. Comments on reviews a. Need to avoid just summarizing web page asks you for:
1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of
More informationAchieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
More informationNew Features in PSP2 for SANsymphony -V10 Software-defined Storage Platform and DataCore Virtual SAN
New Features in PSP2 for SANsymphony -V10 Software-defined Storage Platform and DataCore Virtual SAN Updated: May 19, 2015 Contents Introduction... 1 Cloud Integration... 1 OpenStack Support... 1 Expanded
More informationLecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at
Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How
More informationInvestor Newsletter. Storage Made Easy Cloud Appliance High Availability Options WHAT IS THE CLOUD APPLIANCE?
Investor Newsletter Storage Made Easy Cloud Appliance High Availability Options WHAT IS THE CLOUD APPLIANCE? The SME Cloud Appliance is a software platform that enables companies to enhance their existing
More informationWhitepaper. NexentaConnect for VMware Virtual SAN. Full Featured File services for Virtual SAN
Whitepaper NexentaConnect for VMware Virtual SAN Full Featured File services for Virtual SAN Table of Contents Introduction... 1 Next Generation Storage and Compute... 1 VMware Virtual SAN... 2 Highlights
More informationSAN Conceptual and Design Basics
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
More informationPARALLELS CLOUD SERVER
PARALLELS CLOUD SERVER An Introduction to Operating System Virtualization and Parallels Cloud Server 1 Table of Contents Introduction... 3 Hardware Virtualization... 3 Operating System Virtualization...
More informationINCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT
INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT UNPRECEDENTED OBSERVABILITY, COST-SAVING PERFORMANCE ACCELERATION, AND SUPERIOR DATA PROTECTION KEY FEATURES Unprecedented observability
More informationArchitecting for the cloud designing for scalability in cloud-based applications
An AppDynamics Business White Paper Architecting for the cloud designing for scalability in cloud-based applications The biggest difference between cloud-based applications and the applications running
More informationIn Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
More informationTOP FIVE REASONS WHY CUSTOMERS USE EMC AND VMWARE TO VIRTUALIZE ORACLE ENVIRONMENTS
TOP FIVE REASONS WHY CUSTOMERS USE EMC AND VMWARE TO VIRTUALIZE ORACLE ENVIRONMENTS Leverage EMC and VMware To Improve The Return On Your Oracle Investment ESSENTIALS Better Performance At Lower Cost Run
More informationAmazon EC2 Product Details Page 1 of 5
Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of
More informationwww.basho.com Technical Overview Simple, Scalable, Object Storage Software
www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...
More informationAssignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
More informationA Comparative Study on Vega-HTTP & Popular Open-source Web-servers
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...
More informationEvaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group
NoSQL Evaluator s Guide McKnight Consulting Group William McKnight is the former IT VP of a Fortune 50 company and the author of Information Management: Strategies for Gaining a Competitive Advantage with
More informationBest Practices for Installing and Configuring the Hyper-V Role on the LSI CTS2600 Storage System for Windows 2008
Best Practices Best Practices for Installing and Configuring the Hyper-V Role on the LSI CTS2600 Storage System for Windows 2008 Installation and Configuration Guide 2010 LSI Corporation August 13, 2010
More informationCloud Server. Parallels. An Introduction to Operating System Virtualization and Parallels Cloud Server. White Paper. www.parallels.
Parallels Cloud Server White Paper An Introduction to Operating System Virtualization and Parallels Cloud Server www.parallels.com Table of Contents Introduction... 3 Hardware Virtualization... 3 Operating
More informationBigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
More informationHelping MSPs protect Data Center resources
Helping MSPs protect Data Center resources Due to shrinking IT staffing and budgets, many IT organizations are turning to Service Providers for hosting of business-critical systems and applications (i.e.
More informationSCALABLE DATA SERVICES
1 SCALABLE DATA SERVICES 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2 Overview MySQL Database Clustering GlusterFS Memcached 3 Overview Problems of Data Services 4 Data retrieval
More informationMigration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module
Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module June, 2015 WHITE PAPER Contents Advantages of IBM SoftLayer and RackWare Together... 4 Relationship between
More informationAutomatic Service Migration in WebLogic Server An Oracle White Paper July 2008
Automatic Service Migration in WebLogic Server An Oracle White Paper July 2008 NOTE: The following is intended to outline our general product direction. It is intended for information purposes only, and
More informationDISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2
DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing Slide 1 Slide 3 A style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.
More informationRingStor User Manual. Version 2.1 Last Update on September 17th, 2015. RingStor, Inc. 197 Route 18 South, Ste 3000 East Brunswick, NJ 08816.
RingStor User Manual Version 2.1 Last Update on September 17th, 2015 RingStor, Inc. 197 Route 18 South, Ste 3000 East Brunswick, NJ 08816 Page 1 Table of Contents 1 Overview... 5 1.1 RingStor Data Protection...
More informationMicrosoft Private Cloud Fast Track
Microsoft Private Cloud Fast Track Microsoft Private Cloud Fast Track is a reference architecture designed to help build private clouds by combining Microsoft software with Nutanix technology to decrease
More informationQuantum StorNext. Product Brief: Distributed LAN Client
Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without
More informationAzure Scalability Prescriptive Architecture using the Enzo Multitenant Framework
Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should
More informationMigration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module
Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module June, 2015 WHITE PAPER Contents Advantages of IBM SoftLayer and RackWare Together... 4 Relationship between
More informationFrequently Asked Questions
Frequently Asked Questions 1. Q: What is the Network Data Tunnel? A: Network Data Tunnel (NDT) is a software-based solution that accelerates data transfer in point-to-point or point-to-multipoint network
More informationRunning a Workflow on a PowerCenter Grid
Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
More informationManjrasoft Market Oriented Cloud Computing Platform
Manjrasoft Market Oriented Cloud Computing Platform Aneka Aneka is a market oriented Cloud development and management platform with rapid application development and workload distribution capabilities.
More informationRed Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment
Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment WHAT IS IT? Red Hat Network (RHN) Satellite server is an easy-to-use, advanced systems management platform
More informationProduct Brochure. Hedvig Distributed Storage Platform Modern Storage for Modern Business. Elastic. Accelerate data to value. Simple.
Product Brochure Elastic Scales to petabytes of data Start with as few as two nodes and scale to thousands. Add capacity if and when needed. Embrace the economics of commodity x86 infrastructure to build
More informationRed Hat Satellite Management and automation of your Red Hat Enterprise Linux environment
Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment WHAT IS IT? Red Hat Satellite server is an easy-to-use, advanced systems management platform for your Linux infrastructure.
More informationMySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)
MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!) Erdélyi Ernő, Component Soft Kft. erno@component.hu www.component.hu 2013 (c) Component Soft Ltd Leading Hadoop Vendor Copyright 2013,
More informationManjrasoft Market Oriented Cloud Computing Platform
Manjrasoft Market Oriented Cloud Computing Platform Innovative Solutions for 3D Rendering Aneka is a market oriented Cloud development and management platform with rapid application development and workload
More informationHRG Assessment: Stratus everrun Enterprise
HRG Assessment: Stratus everrun Enterprise Today IT executive decision makers and their technology recommenders are faced with escalating demands for more effective technology based solutions while at
More informationMarkLogic Server Scalability, Availability, and Failover Guide
Scalability, Availability, and Failover Guide 1 MarkLogic 8 February, 2015 Last Revised: 8.0-1, February, 2015 Copyright 2015 MarkLogic Corporation. All rights reserved. Table of Contents Table of Contents
More informationOnline Transaction Processing in SQL Server 2008
Online Transaction Processing in SQL Server 2008 White Paper Published: August 2007 Updated: July 2008 Summary: Microsoft SQL Server 2008 provides a database platform that is optimized for today s applications,
More informationVMware vcloud Automation Center 6.1
VMware vcloud Automation Center 6.1 Reference Architecture T E C H N I C A L W H I T E P A P E R Table of Contents Overview... 4 What s New... 4 Initial Deployment Recommendations... 4 General Recommendations...
More informationNoSQL Databases. Nikos Parlavantzas
!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!
More informationRealizing the True Potential of Software-Defined Storage
Realizing the True Potential of Software-Defined Storage Who should read this paper Technology leaders, architects, and application owners who are looking at transforming their organization s storage infrastructure
More informationMICROSOFT HYPER-V SCALABILITY WITH EMC SYMMETRIX VMAX
White Paper MICROSOFT HYPER-V SCALABILITY WITH EMC SYMMETRIX VMAX Abstract This white paper highlights EMC s Hyper-V scalability test in which one of the largest Hyper-V environments in the world was created.
More information<Insert Picture Here> Oracle In-Memory Database Cache Overview
Oracle In-Memory Database Cache Overview Simon Law Product Manager The following is intended to outline our general product direction. It is intended for information purposes only,
More informationVirtual SAN Design and Deployment Guide
Virtual SAN Design and Deployment Guide TECHNICAL MARKETING DOCUMENTATION VERSION 1.3 - November 2014 Copyright 2014 DataCore Software All Rights Reserved Table of Contents INTRODUCTION... 3 1.1 DataCore
More informationORACLE DATABASE 10G ENTERPRISE EDITION
ORACLE DATABASE 10G ENTERPRISE EDITION OVERVIEW Oracle Database 10g Enterprise Edition is ideal for enterprises that ENTERPRISE EDITION For enterprises of any size For databases up to 8 Exabytes in size.
More informationVMware vsphere Data Protection
VMware vsphere Data Protection Replication Target TECHNICAL WHITEPAPER 1 Table of Contents Executive Summary... 3 VDP Identities... 3 vsphere Data Protection Replication Target Identity (VDP-RT)... 3 Replication
More informationBASICS OF SCALING: LOAD BALANCERS
BASICS OF SCALING: LOAD BALANCERS Lately, I ve been doing a lot of work on systems that require a high degree of scalability to handle large traffic spikes. This has led to a lot of questions from friends
More informationFAWN - a Fast Array of Wimpy Nodes
University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed
More informationWelcome to the IBM Education Assistant module for Tivoli Storage Manager version 6.2 Hyper-V backups. hyper_v_backups.ppt.
Welcome to the IBM Education Assistant module for Tivoli Storage Manager version 6.2 Hyper-V backups. Page 1 of 21 You are familiar with Tivoli Storage Manager version 5.5 or higher. Page 2 of 21 When
More informationService Catalogue. virtual services, real results
Service Catalogue virtual services, real results September 2015 Table of Contents About the Catalyst Cloud...1 Get in contact with us... 2 Services... 2 Infrastructure services 2 Platform services 7 Management
More informationEvolution of Web Application Architecture International PHP Conference. Kore Nordmann / @koredn / <kore@qafoo.com> June 9th, 2015
Evolution of Web Application Architecture International PHP Conference Kore Nordmann / @koredn / June 9th, 2015 Evolution Problem Too many visitors Evolution Evolution Lessons Learned:
More informationNoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
More information