Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise

Size: px
Start display at page:

Download "Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise"

Transcription

1 Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise White Paper BY DATASTAX CORPORATION October

2 Table of Contents Abstract 3 Introduction 3 The Growth in Multiple Data Centers 3 A Brief Multi-Data Center Database Checklist 4 A Look at Apache Cassandra 5 Cassandra and Multiple Data Centers 6 Multi-Data Center Performance 7 Running Analytics and Search Across Multiple Data Centers 7 Options for Multi-Data Center Analytics and Search 7 A Look at DataStax Enterprise 8 What About the Cloud? 8 Managing and Monitoring Multi-Data Center Deployments 9 Multi-Data Center Customer Examples 10 Conclusion 11 About DataStax 11 2

3 Abstract Many modern businesses serve customers all around the world, with database applications that need to be always available even if a disaster hits a particular region. A database that easily spans multiple data centers and/or the cloud ensures the fastest possible response times for customers and employees who are geographically separated. A multi-data center database also protects information from loss in the event that a single data center experiences a disaster. This paper discusses why multi-data center databases are fast becoming the new norm for database operations, and how Apache Cassandra and DataStax Enterprise can comprise a smart and agile data store that is truly location-independent. Introduction Many modern businesses have external-facing database applications that are dramatically growing, and which serve a customer base that is geographically dispersed. Numerous companies also have workforces that are highly distributed in nature, with each employee needing fast access to the same corporate information no matter where they happen to be located. A database that easily spans multiple data centers and/or the cloud ensures the fastest possible response times (both read and write) for customers and employees who are geographically separated. A multi-data center database also provides a number of other benefits such as protecting information from loss in the event that a single data center experiences a disaster. This paper discusses why multi-data center databases are fast becoming the new norm for database operations, along with what characteristics a database must possess to run across many data centers and the cloud at once. Focus is then turned to how Apache Cassandra and DataStax Enterprise can be easily configured to run across multiple data centers and cloud providers to meet the requirements of those needing a smart and agile datastore that is truly location independent. The Growth in Multiple Data Centers A 2012 article in InfoWorld divulged interesting statistics about the rise and growth of multidata centers. In their latest poll of data center managers, the Uptime Institute discovered that 80 percent of respondents have built a new data center or upgraded an existing facility within the past five years. 1 The same article cited another study of the North American data center market done by Digital Realty Trust. In that study, 92 percent of respondents said their companies will definitely or probably expand their data center space in 2012 the highest percentage reported in six years. This news, coupled with the fact that data centers are primarily put in place to hold (no surprise) corporate data, makes it plain to see that the need for databases that can easily span and interact between multiple data centers is only going to escalate and likely at a rapid clip. 1 Large enterprises handing off data center builds as demand booms, by Ann Bednarz, InfoWorld, April 23, 2012: 3

4 Why Multi-Data Center Datastores? The reasons why a multi-data center datastore is needed vary. Some use cases involve just the simple desire for a good disaster recovery plan. But the majority of multi-data center use cases revolve around needing to keep one logical database synched up between 1-N physical data centers and to deliver, as quickly as possible, response times for the users that each data center serves. One other factor contributing to the multi-data center discussion is big data. Those familiar with the term big data normally can recite the three V s of what makes up big data: velocity, volume, and variety. However, one overlooked aspect of big data systems is complexity, which, according to Gartner Inc., involves the domain of managing data across many different data centers, time zones, geographies, and so forth. 2 Distributing data across many different data centers and the cloud is not an easy task with traditional databases. When one adds characteristics of data that is coming in at extremely high rates of speed from many places, data that is of varying formats, and data that can involve heavy volumes, the job becomes even harder. A Brief Multi-Data Center Database Checklist Even outside of big data environments, legacy relational databases (RDBMSs), the primary datastores for most businesses, have traditionally provided minimal support for multi-data centers. Other than basic replication or one-way mirroring, all RDBMS vendors lack key built-in features needed by modern applications that require a datastore that spans many different data centers and/or cloud geographies. This raises the question: What are the features and capabilities that a modern database / datastore needs to meet the demands of multi-data center operations? Does it just equate to log shipping, mirroring between data centers, or master-slave replication or is it something else? Increasingly, the must-have short list from those wanting modern multi-data center capabilities includes the following: The ability to span 1-N data centers, and not just two. This includes the agility to handle multiple cloud geo-zones as well. Multidirectional syncs between all participating data centers, and not just one way. Or, in other words, the desire to have truly location independent, read and write anywhere freedom. Built-in network intelligence, so that data is smartly transferred between data centers to minimize bandwidth overload and latency issues. The ability to support the required type of data traffic across data centers (e.g. real-time, analytic, search). Capabilities for handling big data use cases in a way where all data centers appear as just one logical database to an end user application. 2 Big Data Is Only the Beginning of Extreme Information Management, by Beyer, et al., Gartner Group Inc., April 7, 2011: 4

5 Pulling this off is not easy unless one starts with the right database architecture and feature set. Traditional master-slave designs inherent in RDBMSs and some NoSQL solutions are many times practically impossible, as the requirement for true location independence cannot be met. Fortunately, Apache Cassandra possesses the right blend of technical features and big data capabilities to handle modern multi-data center and cloud deployments. A Look at Apache Cassandra Apache Cassandra is a massively scalable NoSQL database. Cassandra s technical roots can be found at companies recognized for their ability to effectively tackle big data Google, Amazon, and Facebook. Used today by numerous modern businesses to manage their critical data infrastructure, Cassandra is known for being the solution technical professionals turn to when they need a realtime NoSQL database that supplies high performance at massive scale, which never goes down. Rather than using a legacy master-slave or a manual and difficult-to-maintain sharded design, Cassandra has a peer-to-peer (or masterless ) distributed ring architecture that is elegant, easy to set up, and maintain. In Cassandra, all nodes are the same; there is no concept of a master node, with all nodes communicating with each other via a gossip protocol. Cassandra s built-for-scale architecture means that it is capable of handling terabytes of information and thousands of concurrent users/operations per second across one to many data centers as easily as it can manage much smaller amounts of data and user traffic. It also means that, unlike other master-slave or sharded systems, Cassandra has no single point of failure and therefore is capable of offering true continuous availability. 5

6 Cassandra and Multiple Data Centers Cassandra s architecture is tailor-made for multiple data centers. Its peer-to-peer design (vs. legacy master-slave implementations) coupled with online scale-out and full redundancy that offers no single points of failure and continuous availability make it ideal in multi-data center environments. Because Cassandra is a masterless architecture, all nodes are the same and all nodes offer full read/write capabilities in a database cluster, regardless of where those nodes are physically located. A single Cassandra ring (or database cluster) can certainly exist at just one physical data center. However, Cassandra can easily support a single database spanning multiple data centers, where each data center holds its own copy of the database and can have as many nodes as needed for supporting that site: Figure 2: A Single Cassandra Database with Multiple Data Centers Creating a database that spans multiple data centers in Cassandra is easy and is accomplished via the definition of a new database. Once the database software has been installed on all machines in all participating data centers and is running, and network communication has been established among all the nodes, a keyspace (analogous to an RDBMS database) is created using Cassandra s CQL language. Within the definition of a keyspace, each data center is identified (with the ID matching configuration parameters that have been previously set) along with the number of copies of the data that the keyspace will hold in each data center. For example, the syntax below creates a new keyspace named Globalbiz, with three data centers (DC1, DC2, and DC3): the first and second holding six total copies of the data (for fault tolerance purposes) and the third data center holding three copies: CREATE KEYSPACE Globalbiz WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'DC1': 6, 'DC2' : 6, DC3 : 3}; Once this command successfully executes, all data will then be automatically and transparently replicated between all nodes in all data centers with no further work being necessary on the part of any developer or administrator. 6

7 Multi-Data Center Performance One reason for multi-data center deployment is to keep copies of a database close to users of a particular data center/geographic region, with the end result being faster performance for both reads and writes. But what about performance across data centers? Won t updating many nodes in many different data centers put too heavy a load on a database cluster? To eliminate this concern, Cassandra has built-in intelligence to only send a single data stream from one data center to all others participating in a multi-data center cluster. Once the data has reached one of the nodes in a different data center, that node then takes the responsibility to update all other nodes in a cluster that are responsible for holding that piece of data. Figure 3: Cross-Data Center Writes in Cassandra Running Analytics and Search Across Multiple Data Centers In addition to managing real-time data across multiple data centers, many modern businesses also wish to run analytic and enterprise search operations that span more than one data center. As with real-time data, implementing cross-data center operations for analytics and search data has proven to be no easy task. Options for Multi-Data Center Analytics and Search The need for multi-data center support for analytics and enterprise search has not been lost on those developing and supporting analytics and search technology like Apache Hadoop and Apache Solr. Today, Apache Hadoop offers a warm standby option that can be configured to go to a different data center. Third-party Hadoop vendors also offer solutions with one-way mirror capabilities. For Solr, writes to Solr indexes in the community version of Solr cannot span multiple data centers. Instead, there is only replication support to another node in a different data center via rsync. 7

8 Both the open source versions of Hadoop and Solr as well as those offered by third-party software vendors miss the mark where the criteria for operating a datastore in a multi-data center environment is concerned. However, DataStax Enterprise, offered by DataStax, supplies not only multi-data center support that meets the criteria suggested earlier in this paper for real-time/online data, but also delivers the same enterprise support for running analytics and search on Cassandra data across multiple data centers. A Look at DataStax Enterprise DataStax is the most trusted provider of Cassandra, employing the Apache chair of the Cassandra project as well as most of the committers. For enterprises that want to use Cassandra in production, DataStax supplies DataStax Enterprise Edition, which includes an enterprise-ready version of Cassandra plus built in security and the ability to run analytics and enterprise search operations on Cassandra data. With DataStax Enterprise, modern businesses get a complete big data platform that contains: A certified version of Cassandra that has passed DataStax s rigorous internal certification process, which includes heavy quality assurance testing, performance benchmarking, and defect resolution. Integrated analytics on Cassandra data using Hadoop MapReduce, Hive, Pig, Mahout, and Sqoop. Bundled enterprise search support with Apache Solr. Automatic management services that transparently run and take care of many administration tasks without IT staff involvement. DataStax OpsCenter, a visual management and monitoring tool. Expert, 24x7x365 support. Certified maintenance releases and platform certification What About the Cloud? Both Cassandra and DataStax Enterprise are fully cloud-enabled and capable of supporting multiple availability zones in a cloud provider. Further, hybrid deployments are supported so that a single cluster can span multiple on-premise installations as well as cloud-based implementations. Figure 4: Cassandra supports hybrid on-premise/cloud deployments 8

9 Managing and Monitoring Multi-Data Center Deployments Administering and monitoring the performance of any distributed database system can be challenging, especially when the database spans multiple geographical locations. However, DataStax makes it easy to manage multi-data center databases with DataStax OpsCenter. DataStax OpsCenter is a visual management and monitoring solution for Cassandra. Because DataStax OpsCenter is web based, developers or administrators can easily manage and monitor all aspects of their databases from any desktop, laptop, or tablet without installing any client software. This includes databases that span multiple data centers and the cloud. Figure 5: Managing a 9-node Cassandra cluster with DataStax OpsCenter 9

10 Multi-Data Center Customer Examples Figure 6: A sample of companies and organizations using Cassandra in production Some DataStax customers using Cassandra and DataStax Enterprise across multiple data centers and the cloud include: Netflix has over 500 nodes of Cassandra running in multiple clusters and geo-zones on Amazon. ebay has over 200 TB in DataStax Enterprise across three data centers. HealthX supports their online patient and provider portal with DataStax Enterprise running in multiple geographies on Amazon. ReachLocal uses DataStax Enterprise in six different data centers across the world to support their global online advertising business. Pantheon Systems uses Cassandra across multiple data centers to deliver their cloud-based web development platform. Scandit runs Cassandra across three different data centers to support its mobile barcode and product scanning service. 10

11 Conclusion Today s successful businesses are looking for a modern database management system that can easily span multiple data centers and handle real-time, analytic, and enterprise search operations. Cassandra and DataStax Enterprise meet the requirements these businesses have for multi-data center and cloud support. To find out more about Cassandra and DataStax, and to obtain downloads of Cassandra and DataStax Enterprise software, please visit or send an to info@datastax.com. Note that DataStax Enterprise Edition is completely free to evaluate in development environments, while production deployments require the purchase of a software subscription. About DataStax DataStax powers the big data applications that transform business for more than 300 customers, including startups and 20 of the Fortune 100. DataStax delivers a massively scalable, flexible and continuously available big data platform built on Apache Cassandra. DataStax integrates enterprise-ready Cassandra and includes the ability to run analytics and search on Cassandra data across multi-data centers and in the cloud. Companies such as Adobe, Healthcare Anytime, ebay and Netflix rely on DataStax to transform their businesses. Based in San Mateo, Calif., DataStax is backed by industry-leading investors: Lightspeed Venture Partners, Crosslink Capital and Meritech Capital Partners. For more information, visit DataStax or follow 11

Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER

Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER By DataStax Corporation August 2012 Contents Introduction...3 The Growth in Multiple Data Centers...3 Why

More information

Introduction to Apache Cassandra

Introduction to Apache Cassandra Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating

More information

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS)

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) White Paper BY DATASTAX CORPORATION August 2013 1 Table of Contents Abstract 3 Introduction 3 Overview of HDFS 4

More information

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER By DataStax Corporation September 2012 Contents Introduction... 3 Overview of HDFS... 4 The Benefits

More information

Table of Contents... 2

Table of Contents... 2 Why NoSQL? Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 You Have Big Data... 3 How Does DataStax Helps Manage Big Data... 3 Big Data Performance... 4 You Need Continuous Availability...

More information

The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success

The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success 1 Table of Contents Abstract... 3 Introduction... 3 Requirement #1 Smarter Customer Interactions... 4 Requirement

More information

Big Data: Beyond the Hype. Why Big Data Matters to You. White Paper

Big Data: Beyond the Hype. Why Big Data Matters to You. White Paper Big Data: Beyond the Hype Why Big Data Matters to You White Paper BY DATASTAX CORPORATION October 2013 Table of Contents Abstract 3 Introduction 3 Big Data and You 5 Big Data Is More Prevalent Than You

More information

Evaluating Apache Cassandra as a Cloud Database White Paper

Evaluating Apache Cassandra as a Cloud Database White Paper Evaluating Apache Cassandra as a Cloud Database White Paper BY DATASTAX CORPORATION October 2013 1 Table of Contents Abstract 3 Introduction 3 Why Move to a Cloud Database? 3 The Cloud Promises Transparent

More information

Big Data: Beyond the Hype

Big Data: Beyond the Hype Big Data: Beyond the Hype Why Big Data Matters to You WHITE PAPER Big Data: Beyond the Hype Why Big Data Matters to You By DataStax Corporation October 2011 Table of Contents Introduction...4 Big Data

More information

Big Data: Beyond the Hype

Big Data: Beyond the Hype Big Data: Beyond the Hype Why Big Data Matters to You WHITE PAPER By DataStax Corporation March 2012 Contents Introduction... 3 Big Data and You... 5 Big Data Is More Prevalent Than You Think... 5 Big

More information

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization

More information

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014 Highly available, scalable and secure data with Cassandra and DataStax Enterprise GOTO Berlin 27 th February 2014 About Us Steve van den Berg Johnny Miller Solutions Architect Regional Director Western

More information

Don t Let Your Shoppers Drop; 5 Rules for Today s Ecommerce A guide for ecommerce teams comprised of line-of-business managers and IT managers

Don t Let Your Shoppers Drop; 5 Rules for Today s Ecommerce A guide for ecommerce teams comprised of line-of-business managers and IT managers Don t Let Your Shoppers Drop; 5 Rules for Today s Ecommerce A guide for ecommerce teams comprised of line-of-business managers and IT managers White Paper BY DATASTAX CORPORATION AUGUST 2013 Table of Contents

More information

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER Evaluating Apache Cassandra as a Cloud Database WHITE PAPER Evaluating Apache Cassandra as a Cloud Database By DataStax Corporation November 2011 Contents Introduction... 3 Why Move to a Cloud Database?...

More information

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER Evaluating Apache Cassandra as a Cloud Database WHITE PAPER By DataStax Corporation March 2012 Contents Introduction... 3 Why Move to a Cloud Database?... 3 The Cloud Promises Transparent Elasticity...

More information

Cloudwick. CLOUDWICK LABS Big Data Research Paper. Nebula: Powering Enterprise Private & Hybrid Cloud for DataStax Big Data

Cloudwick. CLOUDWICK LABS Big Data Research Paper. Nebula: Powering Enterprise Private & Hybrid Cloud for DataStax Big Data Nebula: Powering Enterprise Private & Hybrid Cloud for DataStax Big Data was commissioned to evaluate and test the Nebula One Private and Hybrid Cloud Appliance using DataStax, a leading Apache Cassandra

More information

Simplifying Database Management with DataStax OpsCenter

Simplifying Database Management with DataStax OpsCenter Simplifying Database Management with DataStax OpsCenter Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 DataStax OpsCenter... 3 How Does DataStax OpsCenter Work?... 3 The OpsCenter

More information

Complying with Payment Card Industry (PCI-DSS) Requirements with DataStax and Vormetric

Complying with Payment Card Industry (PCI-DSS) Requirements with DataStax and Vormetric Complying with Payment Card Industry (PCI-DSS) Requirements with DataStax and Vormetric Table of Contents Table of Contents... 2 Overview... 3 PIN Transaction Security Requirements... 3 Payment Application

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search

Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search Table of Contents Introduction... 3 Why Search?... 3 General Search Requirements... 3 Traditional Deployment

More information

Comparing Oracle with Cassandra / DataStax Enterprise

Comparing Oracle with Cassandra / DataStax Enterprise Comparing Oracle with Cassandra / DataStax Enterprise Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 Oracle and Today s Online Applications... 3 Architectural Limitations... 3

More information

INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

More information

DataStax Enterprise, powered by Apache Cassandra (TM)

DataStax Enterprise, powered by Apache Cassandra (TM) PerfAccel (TM) Performance Benchmark on Amazon: DataStax Enterprise, powered by Apache Cassandra (TM) Disclaimer: All of the documentation provided in this document, is copyright Datagres Technologies

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

HealthCare Anytime. As we approach the 2020s, the trend toward big data, tools, and systemization

HealthCare Anytime. As we approach the 2020s, the trend toward big data, tools, and systemization Datastax Provides with a Strategic Competitive Advantage as They Improve Patients Medical Care Executive Summary For more than 20 years, much of the national debate on reforming health care has focused

More information

Enabling SOX Compliance on DataStax Enterprise

Enabling SOX Compliance on DataStax Enterprise Enabling SOX Compliance on DataStax Enterprise Table of Contents Table of Contents... 2 Introduction... 3 SOX Compliance and Requirements... 3 Who Must Comply with SOX?... 3 SOX Goals and Objectives...

More information

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers Modern IT Operations Management Why a New Approach is Required, and How Boundary Delivers TABLE OF CONTENTS EXECUTIVE SUMMARY 3 INTRODUCTION: CHANGING NATURE OF IT 3 WHY TRADITIONAL APPROACHES ARE FAILING

More information

Why Migrate from MySQL to Cassandra?

Why Migrate from MySQL to Cassandra? Why Migrate from MySQL to Cassandra? White Paper BY DATASTAX CORPORATION June 2012 1 Table of Contents Abstract 3 Introduction 3 Why Stay with MySQL 4 Why Migrate from MySQL? 4 Architectural Limitations

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

THE REALITIES OF NOSQL BACKUPS

THE REALITIES OF NOSQL BACKUPS THE REALITIES OF NOSQL BACKUPS White Paper Trilio Data, Inc. March 2015 1 THE REALITIES OF NOSQL BACKUPS TABLE OF CONTENTS INTRODUCTION... 2 NOSQL DATABASES... 2 PROBLEM: LACK OF COMPREHENSIVE BACKUP AND

More information

Welcome to Apache Cassandra 1.0

Welcome to Apache Cassandra 1.0 Welcome to Apache Cassandra 1.0 An Overview for Architects, Developers, and IT Managers WHITE PAPER Welcome to Apache Cassandra 1.0 An Overview for Architects, Developers, and IT Managers By DataStax Corporation

More information

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief DDN Solution Brief Personal Storage for the Enterprise WOS Cloud Secure, Shared Drop-in File Access for Enterprise Users, Anytime and Anywhere 2011 DataDirect Networks. All Rights Reserved DDN WOS Cloud

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Implementing a NoSQL Strategy

Implementing a NoSQL Strategy Implementing a NoSQL Strategy White Paper BY DATASTAX CORPORATION JULY 2013 Table of Contents Abstract 3 Introduction 3 What is Driving NoSQL Adoption in the Enterprise? 3 The Need for Speed 3 The Need

More information

Multi-Datacenter Replication

Multi-Datacenter Replication www.basho.com Multi-Datacenter Replication A Technical Overview & Use Cases Table of Contents Table of Contents... 1 Introduction... 1 How It Works... 1 Default Mode...1 Advanced Mode...2 Architectural

More information

Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service

Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service A Sumo Logic White Paper Introduction Managing and analyzing today s huge volume of machine data has never

More information

ScaleArc for SQL Server

ScaleArc for SQL Server Solution Brief ScaleArc for SQL Server Overview Organizations around the world depend on SQL Server for their revenuegenerating, customer-facing applications, running their most business-critical operations

More information

INTRODUCTION. Specifically we looked at:

INTRODUCTION. Specifically we looked at: 3 INTRODUCTION The Evolve IP-CCNG 2014 North American Call Center Survey Results Paper examined the trends, concerns and spending in today s call centers. Specifically we looked at: Cloud-based versus

More information

Introduction to Cassandra

Introduction to Cassandra Introduction to Cassandra DuyHai DOAN, Technical Advocate Agenda! Architecture cluster replication Data model last write win (LWW), CQL basics (CRUD, DDL, collections, clustering column) lightweight transactions

More information

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper High Availability with Postgres Plus Advanced Server An EnterpriseDB White Paper For DBAs, Database Architects & IT Directors December 2013 Table of Contents Introduction 3 Active/Passive Clustering 4

More information

DBA'S GUIDE TO NOSQL APACHE CASSANDRA

DBA'S GUIDE TO NOSQL APACHE CASSANDRA DBA'S GUIDE TO NOSQL APACHE CASSANDRA THE ENLIGHTENED DBA Smashwords Edition Copyright 2014 The Enlightened DBA This ebook is licensed for your personal enjoyment only. This ebook may not be re-sold or

More information

DataStax Enterprise Reference Architecture

DataStax Enterprise Reference Architecture DataStax Enterprise Reference Architecture DataStax Enterprise Reference Architecture 7.8.15 1 Table of Contents ABSTRACT... 3 INTRODUCTION... 3 DATASTAX ENTERPRISE... 3 ARCHITECTURE... 3 OPSCENTER: EASY-

More information

White Paper. Managing MapR Clusters on Google Compute Engine

White Paper. Managing MapR Clusters on Google Compute Engine White Paper Managing MapR Clusters on Google Compute Engine MapR Technologies, Inc. www.mapr.com Introduction Google Compute Engine is a proven platform for running MapR. Consistent, high performance virtual

More information

How To Use Hp Vertica Ondemand

How To Use Hp Vertica Ondemand Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved. Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat

More information

Enterprise IT is complex. Today, IT infrastructure spans the physical, the virtual and applications, and crosses public, private and hybrid clouds.

Enterprise IT is complex. Today, IT infrastructure spans the physical, the virtual and applications, and crosses public, private and hybrid clouds. ENTERPRISE MONITORING & LIFECYCLE MANAGEMENT Unify IT Operations Enterprise IT is complex. Today, IT infrastructure spans the physical, the virtual and applications, and crosses public, private and hybrid

More information

The Production Cloud

The Production Cloud The Production Cloud The cloud is not just for backup storage, development projects and other low-risk applications. In this document, we look at the characteristics of a public cloud environment that

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

Big Data & the Cloud: The Sum Is Greater Than the Parts

Big Data & the Cloud: The Sum Is Greater Than the Parts E-PAPER March 2014 Big Data & the Cloud: The Sum Is Greater Than the Parts Learn how to accelerate your move to the cloud and use big data to discover new hidden value for your business and your users.

More information

Vistara Lifecycle Management

Vistara Lifecycle Management Vistara Lifecycle Management Solution Brief Unify IT Operations Enterprise IT is complex. Today, IT infrastructure spans the physical, the virtual and applications, and crosses public, private and hybrid

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

No-SQL Databases for High Volume Data

No-SQL Databases for High Volume Data Target Conference 2014 No-SQL Databases for High Volume Data Edward Wijnen 3 November 2014 The New Connected World Needs a Revolutionary New DBMS Today The Internet of Things 1990 s Mobile 1970 s Mainfram

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD Big Analytics for Space Exploration, Entrepreneurship and Policy Opportunities Tiffani Crawford, PhD Big Analytics Characteristics Large quantities of many data types Structured Unstructured Human Machine

More information

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group NoSQL Evaluator s Guide McKnight Consulting Group William McKnight is the former IT VP of a Fortune 50 company and the author of Information Management: Strategies for Gaining a Competitive Advantage with

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Delivering Real-World Total Cost of Ownership and Operational Benefits

Delivering Real-World Total Cost of Ownership and Operational Benefits Delivering Real-World Total Cost of Ownership and Operational Benefits Treasure Data - Delivering Real-World Total Cost of Ownership and Operational Benefits 1 Background Big Data is traditionally thought

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

A Survey of Distributed Database Management Systems

A Survey of Distributed Database Management Systems Brady Kyle CSC-557 4-27-14 A Survey of Distributed Database Management Systems Big data has been described as having some or all of the following characteristics: high velocity, heterogeneous structure,

More information

OPTIMIZING PERFORMANCE IN AMAZON EC2 INTRODUCTION: LEVERAGING THE PUBLIC CLOUD OPPORTUNITY WITH AMAZON EC2. www.boundary.com

OPTIMIZING PERFORMANCE IN AMAZON EC2 INTRODUCTION: LEVERAGING THE PUBLIC CLOUD OPPORTUNITY WITH AMAZON EC2. www.boundary.com OPTIMIZING PERFORMANCE IN AMAZON EC2 While the business decision to migrate to Amazon public cloud services can be an easy one, tracking and managing performance in these environments isn t so clear cut.

More information

How to Unlock Agility by Backing up to, from, and in the Cloud

How to Unlock Agility by Backing up to, from, and in the Cloud WHITE PAPER: HOW TO UNLOCK AGILITY BY BACKING UP TO, FROM,....... AND.... IN.. THE.... CLOUD....................... How to Unlock Agility by Backing up to, from, and in the Cloud Who should read this paper

More information

Get More Scalability and Flexibility for Big Data

Get More Scalability and Flexibility for Big Data Solution Overview LexisNexis High-Performance Computing Cluster Systems Platform Get More Scalability and Flexibility for What You Will Learn Modern enterprises are challenged with the need to store and

More information

The Multi-Model Database Cloud Applications in a Complex World

The Multi-Model Database Cloud Applications in a Complex World The Multi-Model Database Cloud Applications in a Complex World Table of Contents INTRODUCTION MULTI-MODEL: AN EVOLUTIONARY TALE FROM RDBMS TO NOSQL TO MULTI-MODEL DATASTAX ENTERPRISE AND MULTI-MODEL DECIDING

More information

Real-World Scale for Mobile IT: Nine Core Performance Requirements

Real-World Scale for Mobile IT: Nine Core Performance Requirements White Paper Real-World Scale for Mobile IT: Nine Core Performance Requirements Mobile IT Scale As the leader in Mobile IT, MobileIron has worked with hundreds of Global 2000 companies to scale their mobile

More information

www.basho.com Technical Overview Simple, Scalable, Object Storage Software

www.basho.com Technical Overview Simple, Scalable, Object Storage Software www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Practical Guidelines for Selecting NoSQL vs. an RDBMS Deployment Considerations Conclusion About DataStax

Practical Guidelines for Selecting NoSQL vs. an RDBMS Deployment Considerations Conclusion About DataStax TABLE OF CONTENTS Introduction Why NoSQL? NoSQL 101 Types of NoSQL Databases What are the Advantages of NoSQL Over an RDBMS? A NoSQL Example Apache Cassandra What Makes Cassandra Ideal for Modern Online

More information

7 INSIGHTS FROM OUR 2014 CLOUD ADOPTION SURVEY

7 INSIGHTS FROM OUR 2014 CLOUD ADOPTION SURVEY 1 7 INSIGHTS FROM OUR 2014 CLOUD ADOPTION SURVEY THE NEW INDUSTRY PULSE ON CLOUD MIGRATION We asked nearly 200 IT professionals in industries ranging from healthcare and government to finance and media/

More information

Webinar: Modern Data Protection For Next-Gen Apps and Databases

Webinar: Modern Data Protection For Next-Gen Apps and Databases Enterprise Strategy Group Getting to the bigger truth. Webinar: Modern Data Protection For Next-Gen Apps and Databases Nik Rouda, Senior Analyst, ESG Group Tarun Thakur, Co-Founder and CEO, Datos IO Speakers

More information

MakeMyTrip CUSTOMER SUCCESS STORY

MakeMyTrip CUSTOMER SUCCESS STORY MakeMyTrip CUSTOMER SUCCESS STORY MakeMyTrip is the leading travel site in India that is running two ClustrixDB clusters as multi-master in two regions. It removed single point of failure. MakeMyTrip frequently

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

To run large data set applications in the cloud, and run them well,

To run large data set applications in the cloud, and run them well, How to Harness the Power of DBaaS and the Cloud to Achieve Superior Application Performance To run large data set applications in the cloud, and run them well, businesses and other organizations have embraced

More information

Security and Compliance in Big Data

Security and Compliance in Big Data Security and Compliance in Big Data White Paper BY DATASTAX CORPORATION AND GAZZANG, INC MAY 2013 Contents Executive Summary 3 A Brief Note About Compliance 3 HIPAA and HITECH Regulations 4 Payment Card

More information

Modernizing Your Data Warehouse for Hadoop

Modernizing Your Data Warehouse for Hadoop Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

White Paper: What You Need To Know About Hadoop

White Paper: What You Need To Know About Hadoop CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

T a c k l i ng Big Data w i th High-Performance

T a c k l i ng Big Data w i th High-Performance Worldwide Headquarters: 211 North Union Street, Suite 105, Alexandria, VA 22314, USA P.571.296.8060 F.508.988.7881 www.idc-gi.com T a c k l i ng Big Data w i th High-Performance Computing W H I T E P A

More information

Cloud Computing Backgrounder

Cloud Computing Backgrounder Cloud Computing Backgrounder No surprise: information technology (IT) is huge. Huge costs, huge number of buzz words, huge amount of jargon, and a huge competitive advantage for those who can effectively

More information

DataStax Enterprise Reference Architecture. White Paper

DataStax Enterprise Reference Architecture. White Paper DataStax Enterprise Reference Architecture White Paper BY DATASTAX CORPORATION January 2014 Table of Contents Abstract...3 Introduction...3 DataStax Enterprise Architecture...3 Management Interface...

More information

Things You Need to Know About Cloud Backup

Things You Need to Know About Cloud Backup Things You Need to Know About Cloud Backup Over the last decade, cloud backup, recovery and restore (BURR) options have emerged as a secure, cost-effective and reliable method of safeguarding the increasing

More information

WHITE PAPER. 5 Ways Your Organization is Missing Out on Massive Opportunities By Not Using Cloud Software

WHITE PAPER. 5 Ways Your Organization is Missing Out on Massive Opportunities By Not Using Cloud Software WHITE PAPER 5 Ways Your Organization is Missing Out on Massive Opportunities By Not Using Cloud Software Cloud software allows your organization to focus on its strengths and outsource tough data storage

More information

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra A Quick Reference Configuration Guide Kris Applegate kris_applegate@dell.com Solution Architect Dell Solution Centers Dave

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

ScaleArc idb Solution for SQL Server Deployments

ScaleArc idb Solution for SQL Server Deployments ScaleArc idb Solution for SQL Server Deployments Objective This technology white paper describes the ScaleArc idb solution and outlines the benefits of scaling, load balancing, caching, SQL instrumentation

More information

Leveraging Public Clouds to Ensure Data Availability

Leveraging Public Clouds to Ensure Data Availability Systems Engineering at MITRE CLOUD COMPUTING SERIES Leveraging Public Clouds to Ensure Data Availability Toby Cabot Lawrence Pizette The MITRE Corporation manages federally funded research and development

More information

Table of Contents Abstract Introduction The Expanding Digitization of Business The Core of the Internet Enterprise

Table of Contents Abstract Introduction The Expanding Digitization of Business The Core of the Internet Enterprise 1 Table of Contents Abstract... Introduction... Definition... The Expanding Digitization of Business... The Core of the Internet Enterprise... Requirements leading to radical change... Success Factors

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Create and Drive Big Data Success Don t Get Left Behind

Create and Drive Big Data Success Don t Get Left Behind Create and Drive Big Data Success Don t Get Left Behind The performance boost from MapR not only means we have lower hardware requirements, but also enables us to deliver faster analytics for our users.

More information

Making the Business and IT Case for Dedicated Hosting

Making the Business and IT Case for Dedicated Hosting Making the Business and IT Case for Dedicated Hosting Overview Dedicated hosting is a popular way to operate servers and devices without owning the hardware and running a private data centre. Dedicated

More information

RPO represents the data differential between the source cluster and the replicas.

RPO represents the data differential between the source cluster and the replicas. Technical brief Introduction Disaster recovery (DR) is the science of returning a system to operating status after a site-wide disaster. DR enables business continuity for significant data center failures

More information

Solution brief. HP CloudSystem. An integrated and open platform to build and manage cloud services

Solution brief. HP CloudSystem. An integrated and open platform to build and manage cloud services Solution brief An integrated and open platform to build and manage cloud services The industry s most complete cloud system for enterprises and service providers Approximately every decade, technology

More information