Using Object Database db4o as Storage Provider in Voldemort



Similar documents
Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Tier Architectures. Kathleen Durant CS 3200

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

The Sierra Clustered Database Engine, the technology at the heart of

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Cloud Scale Distributed Data Storage. Jürmo Mehine

Integrating Big Data into the Computing Curricula

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

LARGE-SCALE DATA STORAGE APPLICATIONS

Preparing Your Data For Cloud

In Memory Accelerator for MongoDB

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

An Approach to Implement Map Reduce with NoSQL Databases

Practical Cassandra. Vitalii

How To Write A Database Program

Object Relational Database Mapping. Alex Boughton Spring 2011

Advanced Data Management Technologies

Cassandra A Decentralized, Structured Storage System

Lecture Data Warehouse Systems

Structured Data Storage

Object-Oriented Databases db4o: Part 2

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

How to Choose Between Hadoop, NoSQL and RDBMS

The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Dynamo: Amazon s Highly Available Key-value Store

ZooKeeper. Table of contents

Comparing SQL and NOSQL databases

Storage Systems Autumn Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

f...-. I enterprise Amazon SimpIeDB Developer Guide Scale your application's database on the cloud using Amazon SimpIeDB Prabhakar Chaganti Rich Helms

NoSQL Data Base Basics

Cloud Computing Is In Your Future

Lag Sucks! R.J. Lorimer Director of Platform Engineering Electrotank, Inc. Robert Greene Vice President of Technology Versant Corporation

Technical Overview Simple, Scalable, Object Storage Software

Table Of Contents. 1. GridGain In-Memory Database

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

MADOCA II Data Logging System Using NoSQL Database for SPring-8

Social Networks and the Richness of Data

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

The Classical Architecture. Storage 1 / 36

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

FAQs. This material is built based on. Lambda Architecture. Scaling with a queue. 8/27/2015 Sangmi Pallickara

Cloud Computing with Microsoft Azure

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

White Paper. Optimizing the Performance Of MySQL Cluster

GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH

Removing Failure Points and Increasing Scalability for the Engine that Drives webmd.com

database abstraction layer database abstraction layers in PHP Lukas Smith BackendMedia

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

FAWN - a Fast Array of Wimpy Nodes

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Distributed Systems. Tutorial 12 Cassandra

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Distributed Data Stores

MongoDB Developer and Administrator Certification Course Agenda

Slave. Master. Research Scholar, Bharathiar University

Recovery Principles in MySQL Cluster 5.1

Raima Database Manager Version 14.0 In-memory Database Engine

Cloud Computing at Google. Architecture

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

MySQL Storage Engines

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

Data sharing in the Big Data era

Hypertable Architecture Overview

InfiniteGraph: The Distributed Graph Database

NoSQL and Hadoop Technologies On Oracle Cloud

A Survey of Distributed Database Management Systems

Understanding NoSQL Technologies on Windows Azure

Apache Flink Next-gen data analysis. Kostas

Advanced Object Oriented Database access using PDO. Marcus Börger

Scalable Web Programming. CS193S - Jan Jannink - 1/12/10

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Oracle Architecture, Concepts & Facilities

NoSQL Database Systems and their Security Challenges

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

Can the Elephants Handle the NoSQL Onslaught?

Big Systems, Big Data

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

A survey of big data architectures for handling massive data

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Apache Hadoop. Alexandru Costan

How To Scale Out Of A Nosql Database

ONOS Open Network Operating System

Transcription:

Using Object Database db4o as Storage Provider in Voldemort by German Viscuso db4objects (a division of Versant Corporation) <german@db4o.com> September 2010 Abstract: In this article I will show you how easy it is to add support for Versant s object database, db4o in project Voldemort, an Apache licensed distributed key-value storage system used at LinkedIn which is useful for certain high-scalability storage problems where simple functional partitioning is not sufficient (Note: Voldemort borrows heavily from Amazon's Dynamo, if you're interested in this technology all the available papers about Dynamo should be useful). Voldemort s storage layer is completely mockable so development and unit testing can be done against a throw-away in-memory storage system without needing a real cluster or, as we will show next, even a real storage engine like db4o for simple testing. Note: All db4o related files that are part of this project are publicly available on GitHub: http://github.com/germanviscuso/voldemort/tree/master/contrib/db4o/ Index: 1. Voldemort in a Nutshell 2. Key/Value Storage System a. Pros and Cons 3. Logical Architecture 4. db4o API in a Nutshell 5. db4o as Key/Value Storage Provider 6. db4o and BerkeleyDB side by side a. Constructors b. Fetch by Key c. Store Key/Value Pair d. Status of Unit Tests 7. Conclusion Voldemort in a Nutshell In one sentence Voldemort is basically a big, distributed, persistent, fault-tolerant hash table designed to tradeoff consistency for availability. It's implemented as a highly available key-value storage system for services that need to provide an always-on experience. To achieve this level of availability, Voldemort sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and

application-assisted conflict resolution. Voldemort targets applications that operate with weak/eventual consistency (the C in ACID) if this results in high availability even under occasional network failures. It does not provide any isolation guarantees and permits only single key updates. It is well known that when dealing with the possibility of network failures, strong consistency and high data availability cannot be achieved simultaneously. The system uses a series of known techniques to achieve scalability and availability: data is partitioned and replicated using consistent hashing, and consistency is facilitated by object versioning. The consistency among replicas during updates is maintained by a quorum-like technique and a decentralized replica synchronization protocol. It employs a gossip based distributed failure detection and membership protocol and is a completely decentralized system with minimal need for manual administration. Storage nodes can be added and removed from without requiring any manual partitioning or redistribution. Voldemort is not a relational database, it does not attempt to satisfy arbitrary relations while satisfying ACID properties. Nor is it an object database that attempts to transparently map object reference graphs. Nor does it introduce a new abstraction such as document-orientation. However, given the current complex scenario of persistence solutions in the industry we ll see that different breeds can be combined to offer solutions that become the right tool for the job at hand. In this case we ll see how a highly scalable NoSQL solution such as Voldemort can use a fast Java based embedded object database (Versant s db4o) as the underlying persistence mechanism. Voldemort provides the scalability while db4o provides fast key/value pair persistence. Key/Value Storage System To enable high performance and availability Voldemort allows only very simple key-value data access. Both keys and values can be complex compound objects including lists or maps, but none-the-less the only supported queries are effectively the following: value = store.get( key ) store.put( key, value ) store.delete( key ) This is clearly not good enough for all storage problems, there are a variety of trade-offs: Cons no complex query filters all joins must be done in code no foreign key constraints no triggers/callbacks

Pros only efficient queries are possible, very predictable performance easy to distribute across a cluster service-orientation often disallows foreign key constraints and forces joins to be done in code anyway (because key refers to data maintained by another service) using a relational db you need a caching layer to scale reads, the caching layer typically forces you into key-value storage anyway often end up with xml or other denormalized blobs for performance anyway clean separation of storage and logic (SQL encourages mixing business logic with storage operations for efficiency) no object-relational miss-match, no mapping Having a three operation interface makes it possible to transparently mock out the entire storage layer and unit test using a mock-storage implementation that is little more than a HashMap. This makes unit testing outside of a particular container or environment much more practical and was certainly instrumental in simplifying the db4o storage engine implementation. Logical Architecture In Voldemort, each storage node has three main software component groups: request coordination, membership and failure detection, and a local persistence engine. All these components are implemented in Java. If we take a closer look we ll see that each layer in the code implements a simple storage interface that does put, get, and delete. Each of these layers is responsible for performing one function such as TCP/IP network communication, serialization, version reconciliation, inter-node routing, etc. For example the routing layer is responsible for taking an operation, say a PUT, and delegating it to all the N storage replicas in parallel, while handling any failures.

Voldemort s local persistence component allows for different storage engines to be plugged in. Engines that are in use are Oracle s Berkeley Database (BDB), Oracle's MySQL, and an in-memory buffer with persistent backing store. The main reason for designing a pluggable persistence component is to choose the storage engine best suited for an application s access patterns. For instance, BDB can handle objects typically in the order of tens of kilobytes whereas MySQL can handle objects of larger sizes. Applications choose Voldemort s local persistence engine based on their object size distribution. The majority of Voldemort s production instances currently use BDB. In this article we ll provide details about the creation of a new storage engine (yellow box layer on the image above) that uses db4o instead of Oracle s Berkeley DB (BDB), Oracle s MySQL or just memory. Moreover we ll show that db4o s performance is on-par with BDB (if not better) while introducing a significant reduction in the implementation complexity. db4o API in a Nutshell db4o s basic API is not far away from Voldemort s API in terms of simplicity but it s not limited to the storage of key/value pairs. db4o is a general purpose object persistence engine that can deal with the peculiarities of native POJO persistence that exhibit arbitrarily complex object

references. The very basic API looks like this: container.store( object ) container.delete( object ) objectset = container.querybyexample( prototype ) Two more advanced querying mechanisms are available and we will introduce one of them (SODA) on the following sections. As you can see, in order to provide an interface for key/value storage (what Voldemort expects) we ll have to provide an intermediate layer which will take requests via the basic Voldemort API and will translate that into db4o API calls. db4o as Key/Value Storage Provider db4o fits seamlessly into the Voldemort picture as a storage provider because: db4o allows free form storage of objects with no special requirements on object serialization which results in a highly versatile storage solution which can easily be configured to store from low level objects (eg byte arrays) to objects of arbitrarily complexity (both for keys and values on key/value pairs) db4o's b-tree based indexing capabilities allow for quick retrieval of keys while acting as a key/value pair storage provider db4o s simple query system allows the retrieval of key/value pairs with one-line of code db4o can persist versioned objects directly which frees the developer from having to use versioning wrappers on stored values db4o can act as a simpler but powerful replacement to Oracle s Berkeley DB (BDB) because the complexity of the db4o storage provider is closer to Voldemort's in memory storage provider than the included BDB provider which results in faster test and development cycles (remember Voldemort is work in progress). As a first step in the implementation we provide a generic class to expose db4o as a key/value storage provider which is not dependant on Voldemort's API (also useful to db4o developers who need this functionality in any application). The class is voldemort.store.db4o.db4okeyvalueprovider<key, Value> and relies on Java generics so it can be used with any sort of key/value objects. This class implements the following methods which are more or less self explanatory considering they operate on a store of key/value pairs: voldemort.store.db4o.db4okeyvalueprovider.keyiterator() voldemort.store.db4o.db4okeyvalueprovider.getkeys() voldemort.store.db4o.db4okeyvalueprovider.getvalues(key) voldemort.store.db4o.db4okeyvalueprovider.pairiterator()

voldemort.store.db4o.db4okeyvalueprovider.get(key) voldemort.store.db4o.db4okeyvalueprovider.delete(key) voldemort.store.db4o.db4okeyvalueprovider.delete(key, Value) voldemort.store.db4o.db4okeyvalueprovider.delete(db4okeyvaluepair<key, Value>) voldemort.store.db4o.db4okeyvalueprovider.put(key, Value) voldemort.store.db4o.db4okeyvalueprovider.put(db4okeyvaluepair<key, Value>) voldemort.store.db4o.db4okeyvalueprovider.getall() voldemort.store.db4o.db4okeyvalueprovider.getall(iterable<key>) voldemort.store.db4o.db4okeyvalueprovider.truncate() voldemort.store.db4o.db4okeyvalueprovider.commit() voldemort.store.db4o.db4okeyvalueprovider.rollback() voldemort.store.db4o.db4okeyvalueprovider.close() voldemort.store.db4o.db4okeyvalueprovider.isclosed() Second, we implement a db4o StorageProvider following Voldemort's API, namely voldemort.store.db4o.db4ostorageengine which define the key (K) as ByteArray and the value (V) as byte[] for key/value classes. Since this class inherits from abstract class voldemort.store.storageengine<k, V> it must (and it does) implement the following methods: voldemort.store.db4o.db4ostorageengine.getversions(k) voldemort.store.db4o.db4ostorageengine.get(k) voldemort.store.db4o.db4ostorageengine.getall(iterable<k>) voldemort.store.db4o.db4ostorageengine.put(k, Versioned<V>) voldemort.store.db4o.db4ostorageengine.delete(k, Version) We also implement the class that handles the db4o storage configuration: voldemort.store.db4o.db4ostorageconfiguration This class is responsible for providing all the initialization parameters when the db4o database is created (eg. index creation on the key (K) field for fast retrieval of key/value pairs) Third and last we provide three support classes: voldemort.store.db4o.db4okeyvaluepair<key, Value> : a generic key/value pair class. Instances of this class will be what ultimately gets stored in the db4o database. voldemort.store.db4o.db4okeysiterator<key, Value> : an Iterator implementaion that allows to iterate over the keys (K). voldemort.store.db4o.db4oentriesiterator<key, Value> : an Iterator implementaion that allows to iterate over the values (V). and a few test classes: voldemort.catdb4ostore

voldemort.store.db4o.db4ostorageenginetest (this is the main test class) voldemort.store.db4o.db4osplitstorageenginetest db4o and BerkeleyDB side by side Let's take a look at db4o's simplicity by comparing the matching methods side by side with BDB (now both can act as Voldemort's low level storage provider). Constructors: BdbStorageEngine() vs Db4oStorageEngine() The db4o constructor get rids of the serializer object because it can store objects of any complexity with no transformation. The code: this.versionserializer = new Serializer<Version>() { public byte[] tobytes(version object) { return ((VectorClock) object).tobytes(); public Version toobject(byte[] bytes) { return versionedserializer.getversion(bytes); ; is no longer necessary and, of course, this impacts all storage operations because db4o requires no conversions to bytes and to object back and forth: BDB (fetch by key): DatabaseEntry keyentry = new DatabaseEntry(key.get()); DatabaseEntry valueentry = new DatabaseEntry(); List<T> results = Lists.newArrayList(); for(opstatus status = cursor.getsearchkey(keyentry, valueentry, lockmode); status == OperationStatus.SUCCESS; status = cursor.getnextdup(keyentry, valueentry, lockmode)) { results.add(serializer.toobject(valueentry.getdata())); return results; db4o (fetch by key) Query query = getcontainer().query(); query.constrain(db4okeyvaluepair.class); query.descend("key").constrain(key); return query.execute(); in this example we use SODA, a low level and powerful graph based query system where you basically build a query tree and pass it to db4o for execution.

BDB (store key/value pair) DatabaseEntry keyentry = new DatabaseEntry(key.get()); boolean succeeded = false; transaction = this.environment.begintransaction(null, null); // Check existing values // if there is a version obsoleted by this value delete it // if there is a version later than this one, throw an exception DatabaseEntry valueentry = new DatabaseEntry(); cursor = getbdbdatabase().opencursor(transaction, null); for(opstatus status=cursor.getsearchkey(keyentry,valueentry,lockmode.rmw); status == OperationStatus.SUCCESS; status = cursor.getnextdup(keyentry, valueentry, LockMode.RMW)) { VectorClock clock = new VectorClock(valueEntry.getData()); Occured occured = value.getversion().compare(clock); if(occured == Occured.BEFORE) throw new ObsoleteVersionException(); else if(occured == Occured.AFTER) // best effort delete of obsolete previous value! cursor.delete(); // Okay so we cleaned up all the prior stuff, so now we are good to // insert the new thing valueentry = new DatabaseEntry(versionedSerializer.toBytes(value)); OperationStatus status = cursor.put(keyentry, valueentry); if(status!= OperationStatus.SUCCESS) throw new PersistenceFailException("Put operation failed:" + status); succeeded = true; db4o (store key/value pair) boolean succeeded = false; candidates = keyvalueprovider.get(key); for(db4okeyvaluepair<bytearray, Versioned<byte[]>> pair: candidates) { Occured occured = value.getversion().compare(pair.getvalue().getversion() ); if(occured == Occured.BEFORE) throw new ObsoleteVersionException(); else if(occured == Occured.AFTER) // best effort delete of obsolete previous value! keyvalueprovider.delete(pair); // Okay so we cleaned up all the prior stuff, so we can now insert

try { keyvalueprovider.put(key, value); catch(db4oexception de) { throw new PersistenceFailException("Put op failed:" + de.getmessage()); succeeded = true; Status of Unit Tests Let's take a look at the current status of unit tests in for the db4o storage provider (Db4oStorageEngineTest): 1. testpersistence: single key/value pair storage and retrieval with storage engine close operation in between (it takes some time because this is the first test and the Voldemort system is kick started). 2. testequals: test that retrieval of storage engine instance by name gets you the same instance. 3. testnullconstructorparameters: null constructors for storage engine instantiation are illegal. Arguments are storage engine name and database configuration. 4. testsimultaneousiterationandmodification: threaded test of simultaneous puts (inserts), deletes and iteration over pairs (150 puts and 150 deletes before start of iteration). 5. testgetnoentries: test that empty storage empty returns zero pairs. 6. testgetnokeys: test that empty storage empty returns zero keys. 7. testkeyiterationwithserialization: test storage and key retrieval of 5 pairs serialized as Strings. 8. testiterationwithserialization: test storage and retrieval of 5 pairs serialized as Strings. 9. testpruneonwrite: stores 3 versions for one key and tests prune (overwriting of previous versions should happen). 10. testtruncate: stores 3 pairs and issues a truncate database operation. Then verifies db is empty. 11. testemptybytearray: stores 1 pair with zeroed key and tests correct retrieval. 12. testnullkeys: test that basic operations (get, put, getall, delete,etc) fail with a null key parameter (5 operations on pairs total). 13. testputnullvalue: test that put operation works correctly with a null value (2 operations on pairs total, put and get). 14. testgetanddeletenonexistentkey: test that key of non persistent pair doesn't return a value. 15. testfetchedequalsput: stores 1 pair with complex version and makes sure only one entry is stored and retrieved. 16. testversionedput: tests that obsolete or equal versions of a value can't be stored. Test correct storage of incremented version (13 operations on pairs total). 17. testdelete: puts 2 pairs with conflicting versions and deletes one (7 operations on pairs total).

18. testgetversions: Test retrieval of different versions of pair (3 operations on pairs total). 19. testgetall: puts 10 pairs and tests if all can be correctly retrieved at once (getall) 20. testgetallwithabsentkeys: same as before but with non persistent keys (getall returns 0 pairs). 21. testcloseisidempotent: tests that a second close does not result in error. Test db4o (ms) BDB (ms) testpersistence 6.189 4.711 testequals 0.789 0.246 testnullconstructorparameters 0.356 0.199 testsimultaneousiterationandmo dification 2.103 0.503 testgetnoentries 0.034 0.064 testgetnokeys 0.227 1.82 testkeyiterationwithserialization 0.757 1.026 testiterationwithserialization 0.27 0.176 testpruneonwrite 0.581 1.179 testtruncate 0.244 0.048 testemptybytearray 0.031 0.117 testnullkeys 0.204 1.318 testputnullvalue 0.175 0.24 testgetanddeletenonexistentk ey 0.159 0.155 testfetchedequalsput 0.236 1.551 testversionedput 0.356 0.076 testdelete 0.384 0.279 testgetversions 0.306 0.403 testgetall 0.315 3.361 testgetallwithabsentkeys 0.537 0.682 testcloseisidempotent 0.048 0.111 Total time 14.301 18.265

Conclusion db4o provides a low level implementation of a key/value storage engine that is both simpler than BDB and on-par (if not better) in performance. Moreover, the implementation shows that different persistence solutions such as the typical highly scalable, eventually consistent NoSQL engine and object databases can be combined to provide a solution that becomes the right tool for the job at hand which makes typical versus arguments obsolete. Finally the short implementation path taken to make db4o act as a reliable key/value storage provider show the power and versatility of native embedded object databases.