NoSQL Databases. Polyglot Persistence



Similar documents
NoSQL Databases. Nikos Parlavantzas

Cloud Scale Distributed Data Storage. Jürmo Mehine

Preparing Your Data For Cloud

Structured Data Storage

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

Lecture Data Warehouse Systems

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.


A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

NoSQL Database Options

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

NoSQL Data Base Basics

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

The Quest for Extreme Scalability

Slave. Master. Research Scholar, Bharathiar University

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

INTRODUCTION TO CASSANDRA

Big Data Management and NoSQL Databases

Challenges for Data Driven Systems

Introduction to NOSQL

NoSQL. Thomas Neumann 1 / 22

NOSQL VS RDBMS - WHY THERE IS ROOM FOR BOTH

Choosing The Right Big Data Tools For The Job A Polyglot Approach

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Big Data Analytics. Rasoul Karimi

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

Understanding NoSQL on Microsoft Azure

CloudDB: A Data Store for all Sizes in the Cloud

Future-Proofing MySQL for the Worldwide Data Revolution

WINDOWS AZURE DATA MANAGEMENT

A Selection Method of Database System in Bigdata Environment: A Case Study From Smart Education Service in Korea

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

How To Scale Out Of A Nosql Database

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

An Approach to Implement Map Reduce with NoSQL Databases

Big Data Database Revenue and Market Forecast,

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Understanding NoSQL Technologies on Windows Azure

So What s the Big Deal?

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 2,

Databases 2 (VU) ( )

Can the Elephants Handle the NoSQL Onslaught?

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

Applications for Big Data Analytics

MySQL. Leveraging. Features for Availability & Scalability ABSTRACT: By Srinivasa Krishna Mamillapalli

Database Management System as a Cloud Service

Advanced Data Management Technologies

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

NoSQL Database Systems and their Security Challenges

Database Usage in the Public and Private Cloud: Choices and Preferences

GigaSpaces Real-Time Analytics for Big Data

Tap into Hadoop and Other No SQL Sources

Open Source Technologies on Microsoft Azure

NoSQL in der Cloud Why? Andreas Hartmann

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Introduction. OUR MISSION We believe change is the only constant and we embrace it to provide world class services for our clients.

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

NoSQL for SQL Professionals William McKnight

multiparadigm programming Multiparadigm Data Storage for Enterprise Applications

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015

A Survey on Cloud Storage Systems

MakeMyTrip CUSTOMER SUCCESS STORY

How To Write A Database Program

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Data Management in the Cloud

NoSQL Evaluation. A Use Case Oriented Survey

The CAP theorem and the design of large scale distributed systems: Part I

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

NoSQL storage and management of geospatial data with emphasis on serving geospatial data using standard geospatial web services

Data sharing in the Big Data era

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

In Memory Accelerator for MongoDB

NoSQL and Graph Database

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Dominik Wagenknecht Accenture

Transcription:

The future is: NoSQL Databases Polyglot Persistence a note on the future of data storage in the enterprise, written primarily for those involved in the management of application development. Martin Fowler Pramod Sadalage Martin Fowler and Pramod Sadalage Rendered: February 8, 2012 11:26

SQL has Ruled for two decades Store persistent data Storing large amounts of data on disk, while allowing applications to grab the bits they need through queries Application Integration Many applications in an enterprise need to share information. By getting all applications to use the database, we ensure all these applications have consistent, up-todate data Concurrency Control Many users access the same information at the same time. Handling this concurrency is difficult to program, so databases provide transactions to help ensure consistent interaction. Mostly Standard The relational model is widely used and understood. Interaction with the database is done with SQL, which is a (mostly) standard language. This degree of standardization is enough to keep things familiar so people don t need to learn new things Reporting SQL s simple data model and standardization has made it a foundation for many reporting tools 2 All this supported by Big Database Vendors and the separation of the DBA profession.

but SQL s dominance is cracking Relational databases are designed to run on a single machine, so to scale, you need buy a bigger machine But it s cheaper and more effective to scale horizontally by buying lots of machines. SQL SQL 3 The machines in these large clusters are individually unreliable, but the overall cluster keeps working even as machines die - so the overall cluster is reliable. The cloud is exactly this kind of cluster, which means relational databases don t play well with the cloud. The rise of web services provides an effective alternative to shared databases for application integration, making it easier for different applications to choose their own data storage. Google and Amazon were both early adopters of large clusters, and both eschewed relational databases. Google Amazon Bigtable Dynamo Their efforts have been a large inspiration to the NoSQL community

so now we have NoSQL databases There is no standard definition of what NoSQL means. The term began with a workshop organized in 2009, but there is much argument about what databases can truly be called NoSQL. But while there is no formal definition, there are some common characteristics of NoSQL databases they don t use the relational data model, and thus don t use the SQL language they tend to be designed to run on a cluster they tend to be Open Source they don t have a fixed schema, allowing you to store any data in any record examples include We should also remember Google s Bigtable and Amazon s SimpleDB. While these are tied to their host s cloud service, they certainly fit the general operating characteristics 4

so this means we can Reduce Development Drag Embrace Large Scale A lot of effort in application development is tied up in working with relational databases. Although Object/ Relational Mapping frameworks have eased the load, the database is still a significant source of developer hours. Often we can reduce this effort by choosing an alternative database that s more suited to the problem domain. We often come across projects who are using relational databases because they are the default, not because they are the best choice for the job. Often they are paying a cost, in developer time and execution performance, for features they do not use. 5 Guardian New functionality uses Mongo rather than relational DB. They found Mongo s document data model significantly easier to interact with for their kind of application. [more...] DNC Searching 300 Million voters information for 1 person with addresses, emails, phones is tough with a relational data store. MongoDB was used to store the documents about the person. [more...] The large scale clusters that we can support with NoSQL databases allow us to store larger datasets (people are talking about petabytes these days) to process large amounts of analytic data. Alternative data models also allow us to carry out many tasks more efficiently, allowing us to tackle problems that we would have balked at when using only relational databases Danish Health Care Centralized record of drug prescriptions. Currently held in MySQL databases, but concerned about scale for both response time and availability. Migrated data to Riak. [more...] McLaren Streaming of telemetric data into MongoDB for later analysis. Orders of magnitude faster than relational (SQL Server). [more...]

but this does not mean relational is dead the relational model is still relevant The tabular model is suitable for many kinds of data, particularly when you need to pick apart data and re-assemble it in different ways for different purposes. ACID transactions In order to run effectively on a cluster, most NoSQL databases have limited transactional capability. Often this is enough but not always. Tools The long dominance of SQL means that many tools have been written to work with SQL databases. Tooling for alternative datastores is much more limited. Familiarity NoSQL systems are still new, so people aren t familiar with using them. So we shouldn t be using them on utility projects where their benefits would have less impact. 6

this leads us to a world of Polyglot Persistence using multiple data storage technologies, chosen based upon the way data is being used by individual applications. Why store binary images in relational database, when there are better storage systems? Polyglot persistence will occur over the enterprise as different applications use different data storage technologies. It will also occur within a single application as different parts of an application s data store have different access characteristics. http://martinfowler.com/bliki/polyglotpersistence.html 7

what might Polyglot Persistence look like? Rapid access for reads and writes. No need to be durable Needs transactional updates. Tabular structure fits data Needs high availability across multiple locations. Can merge inconsistent writes Rapidly traverse links between friends, product purchases, and ratings Speculative Retailers Web Application User sessions Financial Data Shopping Cart Recomendations Redis Product Catalog MongoDB RDBMS Reporting RDBMS Riak Analytics Cassandra Neo4J User activity logs Cassandra This is a very hypothetical example, we would not make technology recommendations without more contextual information 8 Lots of reads, infrequent writes. Products make natural aggregate SQL interfaces well with reporting tools Large-scale analytics on large cluster High volume of writes on multiple nodes

polyglot persistence provides lots of new opportunities for enterprises i.e. problems Decisions We have to decide what data storage technology to use, rather than just go with relational Immaturity NoSQL tools are still young, and full of the rough edges that new tools have. Furthermore since we don t have much experience with them, we don t know how to use them well, what the good patterns are, and what gotchas are lying in wait. Organizational Change How will our data groups react to this new technology? Dealing with Eventual Consistency Paradigm How will different stakeholders in the enterprise data deal with data that could be stale and how do you enforce rules to sync data across systems 9

what kinds of projects are candidates for polyglot persistence? rapid time to market If you need to get to market quickly, then you need to maximize productivity of your development team. If appropriate, polyglot persistence can remove significant drag. Strategic and and/or data intensive Most software projects are utility projects, i.e. they aren t central to the competitive advantage of the company. Utility projects should not take on the risk and staffing demands that polyglot persistence brings as the potential benefits are not there. 10 Data intensiveness can come in various forms lots of data high availability lots of traffic: reads or writes complex data relationships Any of these may suggest nonrelational storage, but its the exact nature of the data interaction that will suggest the best of the many alternatives.

for more information On the Web We are both active writers on our websites. Martin writes at http://martinfowler.com and Pramod at http://www.sadalage.com/. Forthcoming Book We are currently working on a introductory book to NoSQL databases, to be titled: NoSQL Distilled (see http://martinfowler.com/bliki/ NosqlDistilled.html) Consulting and Delivery ThoughtWorks has carried out several projects delivering production systems using NoSQL technologies. To see if NoSQL is a good fit for your needs, and how we can help your delivery, contact your local ThoughtWorks office, which you can find at http://thoughtworks.com 11