Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER

Size: px
Start display at page:

Download "Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER"

Transcription

1 Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER By DataStax Corporation August 2012

2 Contents Introduction...3 The Growth in Multiple Data Centers...3 Why Multi-Data Center Datastores?...4 A Brief Multi-Data Center Database Checklist...4 A Look at Apache Cassandra...5 Cassandra and Multiple Data Centers...6 Multi-Data Center Performance...7 Running Apache Hadoop and Solr Across Multiple Data Centers...8 Options for Multi-Data Center Hadoop and Solr...8 A Look at DataStax Enterprise...9 Multi-Data Center Support for Hadoop...9 Multi-Data Center Support for Solr...10 What About the Cloud?...10 Managing and Monitoring Multi-Data Center Deployments...11 Multi-Data Center Customer Examples...12 Conclusion...13 About DataStax DataStax. All rights reserved. 2

3 Introduction Many modern businesses have external-facing database applications that are dramatically growing, and which serve a customer base that is geographically dispersed. Numerous companies also have workforces that are highly distributed in nature, with each employee needing fast access to the same corporate information no matter where they happen to be located. A database that easily spans multiple data centers and/or the cloud ensures the fastest possible response times (both read and write) for customers and employees who are geographically separated. A multi-data center database also provides a number of other benefits such as protecting information from loss in the event that a single data center experiences a disaster. This paper discusses why multi-data center databases are fast becoming the new norm for database operations, along with what characteristics a database must possess to run across many data centers and the cloud at once. Focus is then turned to how Apache Cassandra, Hadoop, and Solr can be easily configured to run across multiple data centers and cloud providers to meet the requirements of those needing a smart and agile datastore that is truly location independent. The Growth in Multiple Data Centers A 2012 article in InfoWorld divulged interesting statistics about the rise and growth of multi-data centers. In their latest poll of data center managers, the Uptime Institute discovered that 80 percent of respondents have built a new data center or upgraded an existing facility within the past five years. 1 The same article cited another study of the North American data center market done by Digital Realty Trust. In that study, 92 percent of respondents said their companies will definitely or probably expand their data center space in 2012 the highest percentage reported in six years. This news, coupled with the fact that data centers are primarily put in place to hold (no surprise) corporate data, makes it plain to see that the need for databases that can easily span and interact between multiple data centers is only going to escalate and likely at a rapid clip. 1 Large enterprises handing off data center builds as demand booms, by Ann Bednarz, InfoWorld, April 23, 2012: DataStax. All rights reserved. 3

4 Why Multi-Data Center Datastores? The reasons why a multi-data center datastore is needed vary. Some use cases involve just the simple desire for a good disaster recovery plan. But the majority of multi-data center use cases revolve around needing to keep one logical database synched up between 1-N physical data centers and to deliver, as quickly as possible, response times for the users that each data center serves. One other factor contributing to the multi-data center discussion is big data. Those familiar with the term big data normally can recite the three V s of what makes up big data: velocity, volume, and variety. However, one overlooked aspect of big data systems is complexity, which, according to Gartner Inc., involves the domain of managing data across many different data centers, time zones, geographies, and so forth. 2 Distributing data across many different data centers and the cloud is not an easy task with traditional databases. When one adds characteristics of data that is coming in at extremely high rates of speed from many places, data that is of varying formats, and data that can involve heavy volumes, the job becomes even harder. A Brief Multi-Data Center Database Checklist Even outside of big data environments, legacy relational databases (RDBMSs), the primary datastores for most businesses, have traditionally provided minimal support for multi-data centers. Other than basic replication or one-way mirroring, all RDBMS vendors lack key built-in features needed by modern applications that require a datastore that spans many different data centers and/or cloud geographies. This raises the question: What are the features and capabilities that a modern database/datastore needs to meet the demands of multi-data center operations? Does it just equate to log shipping, mirroring between data centers, or master-slave replication or is it something else? Increasingly, the must-have short list from those wanting modern multi-data center capabilities includes the following: The ability to span 1-N data centers, and not just two. This includes the agility to handle multiple cloud geo-zones as well. Multidirectional syncs between all participating data centers, and not just one way. Or, in other words, the desire to have truly location independent, read and write anywhere freedom. 2 Big Data Is Only the Beginning of Extreme Information Management, by Beyer, et al., Gartner Group Inc., April 7, 2011: DataStax. All rights reserved. 4

5 Built-in network intelligence, so that data is smartly transferred between data centers to minimize bandwidth overload and latency issues. The ability to support the required type of data traffic across data centers (e.g. real-time, analytic, search). Capabilities for handling big data use cases in a way where all data centers appear as just one logical database to an end user application. Pulling this off is not easy unless one starts with the right database architecture and feature set. Traditional master-slave designs inherent in RDBMSs and some NoSQL solutions are many times practically impossible, as the requirement for true location independence cannot be met. Fortunately, Apache Cassandra possesses the right blend of technical features and big data capabilities to handle modern multi-data center and cloud deployments. A Look at Apache Cassandra Apache Cassandra is a massively scalable NoSQL database. Cassandra s technical roots can be found at companies recognized for their ability to effectively tackle big data Google, Amazon, and Facebook. Used today by numerous modern businesses to manage their critical data infrastructure, Cassandra is known for being the solution technical professionals turn to when they need a realtime NoSQL database that supplies high performance at massive scale, which never goes down. Rather than using a legacy master-slave or a manual and difficult-to-maintain sharded design, Cassandra has a peer-to-peer distributed ring architecture that is much more elegant, easy to set up, and maintain. In Cassandra, all nodes are the same; there is no concept of a master node, with all nodes communicating with each other via a gossip protocol. Cassandra s built-for-scale architecture means that it is capable of handling petabytes of information and thousands of concurrent users/operations per second across one to many data centers as easily as it can manage much smaller amounts of data and user traffic. It also means that, unlike other master-slave or sharded systems, Cassandra has no single point of failure and therefore is capable of offering true continuous availability DataStax. All rights reserved. 5

6 Cassandra and Multiple Data Centers Cassandra s architecture is tailor-made for multiple data centers. Its peer-to-peer design (vs. legacy master-slave implementations) coupled with online scale-out and full redundancy that offers no single points of failure and continuous availability make it ideal in multi-data center environments. Because Cassandra is a masterless architecture, all nodes are the same and all nodes offer full read/write capabilities in a database cluster, regardless of where those nodes are physically located. A single Cassandra ring (or database cluster) can certainly exist at just one physical data center. However, Cassandra can easily support a single database spanning multiple data centers, where each data center holds its own copy of the database and can have as many nodes as needed for supporting that site: Figure 1: A Single Cassandra Database with Multiple Data Centers Creating a database that spans multiple data centers in Cassandra is easy and is accomplished via the definition of a new database. Once the database software has been installed on all machines in all participating data centers and is running, and network communication has been established among all the nodes, a keyspace (analogous to an RDBMS database) is created using Cassandra s CQL language. Within the definition of a keyspace, each data center is identified (with the ID matching configuration parameters that have been previously set) along with the number of copies of the data that the keyspace will hold in each data center. For example, the syntax below creates a new keyspace named Globalbiz, with three data centers (DC1, DC2, and DC3): the first and 2012 DataStax. All rights reserved. 6

7 second holding six total copies of the data (for fault tolerance purposes) and the third data center holding three copies: CREATE KEYSPACE Globalbiz WITH strategy_class = 'NetworkTopologyStrategy' AND strategy_options:dc1 = 6 AND strategy_options:dc2 = 6 AND strategy_options:dc3 = 3; Once this command successfully executes, all data will then be automatically and transparently replicated between all nodes in all data centers with no further work being necessary on the part of any developer or administrator. Multi-Data Center Performance One reason for multi-data center deployment is to keep copies of a database close to users of a particular data center/geographic region, with the end result being faster performance for both reads and writes. But what about performance across data centers? Won t updating many nodes in many different data centers put too heavy a load on a database cluster? To eliminate this concern, Cassandra has built-in intelligence to only send a single data stream from one data center to all others participating in a multi-data center cluster. Once the data has reached one of the nodes in a different data center, that node then takes the responsibility to update all other nodes in a cluster that are responsible for holding that piece of data. Figure 2: Cross-Data Center Writes in Cassandra 2012 DataStax. All rights reserved. 7

8 Running Apache Hadoop and Solr Across Multiple Data Centers In addition to managing real-time data across multiple data centers, many modern businesses also wish to run analytic and enterprise search operations that span more than one data center. The most popular open source options today are Apache Hadoop for analytic work and Apache Solr for enterprise search. As with real-time data, implementing cross-data center operations for analytics and search data has proven to be no easy task. Options for Multi-Data Center Hadoop and Solr The need for multi-data center support for analytics and enterprise search has not been lost on those developing and supporting Hadoop and Solr. Today, Apache Hadoop offers a warm standby option that can be configured to go to a different data center. Third-party Hadoop vendors also offer solutions with one-way mirror capabilities. For Solr, writes to Solr indexes in the community version of Solr cannot span multiple data centers. Instead, there is only replication support to another node in a different data center via rsync. Both the open source versions of Hadoop and Solr as well as those offered by third-party software vendors miss the mark where the criteria for operating a datastore in a multi-data center environment is concerned. However, DataStax Enterprise, offered by DataStax, supplies not only multi-data center support that meets the criteria suggested earlier in this paper for real-time data, but also delivers the same enterprise support for multi-data center Hadoop and Solr DataStax. All rights reserved. 8

9 A Look at DataStax Enterprise DataStax is the most trusted provider of Cassandra, employing the Apache chair of the Cassandra project as well as most of the committers. For enterprises that want to use Cassandra in production, DataStax supplies DataStax Enterprise Edition, which includes an enterprise-ready version of Cassandra plus integration with Hadoop and Solr. With DataStax Enterprise, modern businesses get a complete big data platform that contains: A certified version of Cassandra that has passed DataStax s rigorous internal certification process, which includes heavy quality assurance testing, performance benchmarking, and more. An integrated Apache Hadoop distribution for analytic operations that includes MapReduce, Hive, Pig, Mahout, and Sqoop support. Bundled enterprise search support with Apache Solr. An enterprise version of DataStax OpsCenter, a visual management and monitoring tool. Expert, 24x7x365 production support. Certified maintenance releases. Multi-Data Center Support for Hadoop Because DataStax Enterprise is built on Apache Cassandra, it inherits all of Cassandra s strengths where multi-data center support is concerned. In addition to multi-data center capabilities for real-time data management with Cassandra, users of DataStax Enterprise also get full cross-data center support for Hadoop and Solr. Built into DataStax Enterprise is an enhanced Hadoop distribution that utilizes Cassandra for many of its core services. DataStax Enterprise provides integrated Hadoop MapReduce, Hive, Pig, Mahout, and Sqoop, replacing the Hadoop Distributed File System (HDFS) storage layer with Cassandra (the Cassandra File System or CFS). The end product is a single integrated solution that provides increased reliability, simpler deployment, and lower total cost of ownership (TCO) than a traditional Hadoop solution. DataStax Enterprise is also fully compatible with existing HDFS and all Hadoop tools and utilities. Another benefit of using Hadoop in DataStax Enterprise is that it eliminates the complexity and single points of failure of the typical HDFS layer. From an operational standpoint, there is no need to set up a Hadoop name node, secondary name node, Zookeeper, and so on DataStax. All rights reserved. 9

10 From a multi-data center perspective, DataStax Enterprise also provides the ability to have a single Hadoop cluster run across as many data centers as desired. Data added to any Hadoop node in any data center is ready for use at all other sites. Plus, multiple CFSs and Hadoop job trackers can be configured so that each data center has its own local data and job trackers to work with, which increases performance. Multi-Data Center Support for Solr DataStax Enterprise includes strong enterprise search support via Lucene and Apache Solr. By integrating Solr into the DataStax Enterprise big data platform, DataStax extends Solr s capabilities and delivers the following: An easily scalable search platform 100 percent data durability No single point of failure No write bottleneck Automatic data sharding Multi-data center capabilities Easy, ad-hoc index rebuilds The ability to query search data with Cassandra s CQL In the same way that DataStax Enterprise takes Hadoop and delivers a continuously available, dynamically scalable, and multi-data center-capable Hadoop/analytics system, it automatically does the same thing for Solr and enterprise search operations. Using Cassandra as the underlying foundation, DataStax Enterprise allows search data to be written to any participating data center. Those currently using Solr will be right at home with DataStax Enterprise. The solution is 100 percent Solr compatible, with all Solr utilities, APIs, and so on, included. What About the Cloud? Both Cassandra and DataStax Enterprise are fully cloud-enabled and capable of supporting multiple geo-zone sites in a cloud provider. Further, hybrid deployments are supported so that a single cluster can span multiple on-premise installations as well as cloud-based implementations DataStax. All rights reserved. 10

11 Figure 4: Cassandra supports hybrid on-premise/cloud deployments Managing and Monitoring Multi-Data Center Deployments Administering and monitoring the performance of any distributed database system can be challenging, especially when the database spans multiple geographical locations. However, DataStax makes it easy to manage multi-data center databases with DataStax OpsCenter. DataStax OpsCenter is a visual management and monitoring solution for Cassandra and other big data technologies such as Apache Hadoop and Solr. Because DataStax OpsCenter is webbased, developers or administrators can easily manage and monitor all aspects of their databases from any desktop, laptop, or tablet without installing any client software. This includes databases that span multiple data centers and the cloud. Figure 5: Managing a 9-node Cassandra cluster with DataStax OpsCenter 2012 DataStax. All rights reserved. 11

12 Multi-Data Center Customer Examples Many modern businesses and organizations are using Cassandra for critical applications today. Here are just some examples: Figure 6: A sample of companies and organizations using Cassandra in production Some DataStax customers using Cassandra and DataStax Enterprise across multiple data centers and the cloud include: Netflix has over 500 nodes of Cassandra running in multiple clusters and geo-zones on Amazon. ebay has over 200 TB in DataStax Enterprise across three data centers. HealthX supports their online patient and provider portal with DataStax Enterprise running in multiple geographies on Amazon. ReachLocal uses DataStax Enterprise in six different data centers across the world to support their global online advertising business. Williams-Sonoma runs Cassandra across multiple sites to support their retail website operations. Pantheon Systems uses Cassandra across multiple data centers to deliver their cloudbased web development platform. Scandit runs Cassandra across three different data centers to support its mobile barcode and product scanning service DataStax. All rights reserved. 12

13 Conclusion Today s successful businesses are looking for a modern database management system that can easily span multiple data centers and handle real-time, analytic, and enterprise search operations. Cassandra and DataStax Enterprise meet the requirements these businesses have for multi-data center and cloud support. To find out more about Cassandra and DataStax, and to obtain downloads of Cassandra and DataStax Enterprise software, please visit or send an to info@datastax.com. Note that DataStax Enterprise Edition is completely free to use in development environments, while production deployments require the purchase of a software subscription. About DataStax DataStax, the commercial leader in Apache Cassandra, offers products and services that make it easy for customers to build, deploy, and operate big data applications. Over 190 customers use DataStax today, including leaders such as Netflix, Cisco, Rackspace, and Constant Contact, with industries served including web, financial services, telecommunications, logistics, and government. DataStax is backed by industry-leading investors, including Lightspeed Venture and Crosslink, and is based in San Mateo, CA, with offices also in Austin, TX. For more information, visit DataStax. All rights reserved. 13

Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise

Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise White Paper BY DATASTAX CORPORATION October 2013 1 Table of Contents Abstract 3 Introduction 3 The Growth in Multiple

More information

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS)

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) White Paper BY DATASTAX CORPORATION August 2013 1 Table of Contents Abstract 3 Introduction 3 Overview of HDFS 4

More information

Introduction to Apache Cassandra

Introduction to Apache Cassandra Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating

More information

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER By DataStax Corporation September 2012 Contents Introduction... 3 Overview of HDFS... 4 The Benefits

More information

Big Data: Beyond the Hype

Big Data: Beyond the Hype Big Data: Beyond the Hype Why Big Data Matters to You WHITE PAPER By DataStax Corporation March 2012 Contents Introduction... 3 Big Data and You... 5 Big Data Is More Prevalent Than You Think... 5 Big

More information

Big Data: Beyond the Hype

Big Data: Beyond the Hype Big Data: Beyond the Hype Why Big Data Matters to You WHITE PAPER Big Data: Beyond the Hype Why Big Data Matters to You By DataStax Corporation October 2011 Table of Contents Introduction...4 Big Data

More information

The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success

The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success 1 Table of Contents Abstract... 3 Introduction... 3 Requirement #1 Smarter Customer Interactions... 4 Requirement

More information

Table of Contents... 2

Table of Contents... 2 Why NoSQL? Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 You Have Big Data... 3 How Does DataStax Helps Manage Big Data... 3 Big Data Performance... 4 You Need Continuous Availability...

More information

Big Data: Beyond the Hype. Why Big Data Matters to You. White Paper

Big Data: Beyond the Hype. Why Big Data Matters to You. White Paper Big Data: Beyond the Hype Why Big Data Matters to You White Paper BY DATASTAX CORPORATION October 2013 Table of Contents Abstract 3 Introduction 3 Big Data and You 5 Big Data Is More Prevalent Than You

More information

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER Evaluating Apache Cassandra as a Cloud Database WHITE PAPER By DataStax Corporation March 2012 Contents Introduction... 3 Why Move to a Cloud Database?... 3 The Cloud Promises Transparent Elasticity...

More information

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER

Evaluating Apache Cassandra as a Cloud Database WHITE PAPER Evaluating Apache Cassandra as a Cloud Database WHITE PAPER Evaluating Apache Cassandra as a Cloud Database By DataStax Corporation November 2011 Contents Introduction... 3 Why Move to a Cloud Database?...

More information

Evaluating Apache Cassandra as a Cloud Database White Paper

Evaluating Apache Cassandra as a Cloud Database White Paper Evaluating Apache Cassandra as a Cloud Database White Paper BY DATASTAX CORPORATION October 2013 1 Table of Contents Abstract 3 Introduction 3 Why Move to a Cloud Database? 3 The Cloud Promises Transparent

More information

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization

More information

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014 Highly available, scalable and secure data with Cassandra and DataStax Enterprise GOTO Berlin 27 th February 2014 About Us Steve van den Berg Johnny Miller Solutions Architect Regional Director Western

More information

Don t Let Your Shoppers Drop; 5 Rules for Today s Ecommerce A guide for ecommerce teams comprised of line-of-business managers and IT managers

Don t Let Your Shoppers Drop; 5 Rules for Today s Ecommerce A guide for ecommerce teams comprised of line-of-business managers and IT managers Don t Let Your Shoppers Drop; 5 Rules for Today s Ecommerce A guide for ecommerce teams comprised of line-of-business managers and IT managers White Paper BY DATASTAX CORPORATION AUGUST 2013 Table of Contents

More information

Complying with Payment Card Industry (PCI-DSS) Requirements with DataStax and Vormetric

Complying with Payment Card Industry (PCI-DSS) Requirements with DataStax and Vormetric Complying with Payment Card Industry (PCI-DSS) Requirements with DataStax and Vormetric Table of Contents Table of Contents... 2 Overview... 3 PIN Transaction Security Requirements... 3 Payment Application

More information

Cloudwick. CLOUDWICK LABS Big Data Research Paper. Nebula: Powering Enterprise Private & Hybrid Cloud for DataStax Big Data

Cloudwick. CLOUDWICK LABS Big Data Research Paper. Nebula: Powering Enterprise Private & Hybrid Cloud for DataStax Big Data Nebula: Powering Enterprise Private & Hybrid Cloud for DataStax Big Data was commissioned to evaluate and test the Nebula One Private and Hybrid Cloud Appliance using DataStax, a leading Apache Cassandra

More information

Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search

Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search Implementing Search in Web, Mobile, and IOT Applications An Overview of DataStax Enterprise Search Table of Contents Introduction... 3 Why Search?... 3 General Search Requirements... 3 Traditional Deployment

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Simplifying Database Management with DataStax OpsCenter

Simplifying Database Management with DataStax OpsCenter Simplifying Database Management with DataStax OpsCenter Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 DataStax OpsCenter... 3 How Does DataStax OpsCenter Work?... 3 The OpsCenter

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

More information

Comparing Oracle with Cassandra / DataStax Enterprise

Comparing Oracle with Cassandra / DataStax Enterprise Comparing Oracle with Cassandra / DataStax Enterprise Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 Oracle and Today s Online Applications... 3 Architectural Limitations... 3

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

Multi-Datacenter Replication

Multi-Datacenter Replication www.basho.com Multi-Datacenter Replication A Technical Overview & Use Cases Table of Contents Table of Contents... 1 Introduction... 1 How It Works... 1 Default Mode...1 Advanced Mode...2 Architectural

More information

Enabling SOX Compliance on DataStax Enterprise

Enabling SOX Compliance on DataStax Enterprise Enabling SOX Compliance on DataStax Enterprise Table of Contents Table of Contents... 2 Introduction... 3 SOX Compliance and Requirements... 3 Who Must Comply with SOX?... 3 SOX Goals and Objectives...

More information

Search and Real-Time Analytics on Big Data

Search and Real-Time Analytics on Big Data Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its

More information

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved. Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Welcome to Apache Cassandra 1.0

Welcome to Apache Cassandra 1.0 Welcome to Apache Cassandra 1.0 An Overview for Architects, Developers, and IT Managers WHITE PAPER Welcome to Apache Cassandra 1.0 An Overview for Architects, Developers, and IT Managers By DataStax Corporation

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

White Paper. Managing MapR Clusters on Google Compute Engine

White Paper. Managing MapR Clusters on Google Compute Engine White Paper Managing MapR Clusters on Google Compute Engine MapR Technologies, Inc. www.mapr.com Introduction Google Compute Engine is a proven platform for running MapR. Consistent, high performance virtual

More information

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD Big Analytics for Space Exploration, Entrepreneurship and Policy Opportunities Tiffani Crawford, PhD Big Analytics Characteristics Large quantities of many data types Structured Unstructured Human Machine

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

INTRODUCTION. Specifically we looked at:

INTRODUCTION. Specifically we looked at: 3 INTRODUCTION The Evolve IP-CCNG 2014 North American Call Center Survey Results Paper examined the trends, concerns and spending in today s call centers. Specifically we looked at: Cloud-based versus

More information

DataStax Enterprise Reference Architecture

DataStax Enterprise Reference Architecture DataStax Enterprise Reference Architecture DataStax Enterprise Reference Architecture 7.8.15 1 Table of Contents ABSTRACT... 3 INTRODUCTION... 3 DATASTAX ENTERPRISE... 3 ARCHITECTURE... 3 OPSCENTER: EASY-

More information

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014 Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ Cloudera World Japan November 2014 WANdisco Background WANdisco: Wide Area Network Distributed Computing Enterprise ready, high availability

More information

THE REALITIES OF NOSQL BACKUPS

THE REALITIES OF NOSQL BACKUPS THE REALITIES OF NOSQL BACKUPS White Paper Trilio Data, Inc. March 2015 1 THE REALITIES OF NOSQL BACKUPS TABLE OF CONTENTS INTRODUCTION... 2 NOSQL DATABASES... 2 PROBLEM: LACK OF COMPREHENSIVE BACKUP AND

More information

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers Modern IT Operations Management Why a New Approach is Required, and How Boundary Delivers TABLE OF CONTENTS EXECUTIVE SUMMARY 3 INTRODUCTION: CHANGING NATURE OF IT 3 WHY TRADITIONAL APPROACHES ARE FAILING

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

DataStax Enterprise, powered by Apache Cassandra (TM)

DataStax Enterprise, powered by Apache Cassandra (TM) PerfAccel (TM) Performance Benchmark on Amazon: DataStax Enterprise, powered by Apache Cassandra (TM) Disclaimer: All of the documentation provided in this document, is copyright Datagres Technologies

More information

Introduction to Cassandra

Introduction to Cassandra Introduction to Cassandra DuyHai DOAN, Technical Advocate Agenda! Architecture cluster replication Data model last write win (LWW), CQL basics (CRUD, DDL, collections, clustering column) lightweight transactions

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

Why Migrate from MySQL to Cassandra?

Why Migrate from MySQL to Cassandra? Why Migrate from MySQL to Cassandra? White Paper BY DATASTAX CORPORATION June 2012 1 Table of Contents Abstract 3 Introduction 3 Why Stay with MySQL 4 Why Migrate from MySQL? 4 Architectural Limitations

More information

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Dominik Wagenknecht Accenture

Dominik Wagenknecht Accenture Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna

More information

www.basho.com Technical Overview Simple, Scalable, Object Storage Software

www.basho.com Technical Overview Simple, Scalable, Object Storage Software www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

MakeMyTrip CUSTOMER SUCCESS STORY

MakeMyTrip CUSTOMER SUCCESS STORY MakeMyTrip CUSTOMER SUCCESS STORY MakeMyTrip is the leading travel site in India that is running two ClustrixDB clusters as multi-master in two regions. It removed single point of failure. MakeMyTrip frequently

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

A Survey of Distributed Database Management Systems

A Survey of Distributed Database Management Systems Brady Kyle CSC-557 4-27-14 A Survey of Distributed Database Management Systems Big data has been described as having some or all of the following characteristics: high velocity, heterogeneous structure,

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

A Brief Outline on Bigdata Hadoop

A Brief Outline on Bigdata Hadoop A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

ScaleArc for SQL Server

ScaleArc for SQL Server Solution Brief ScaleArc for SQL Server Overview Organizations around the world depend on SQL Server for their revenuegenerating, customer-facing applications, running their most business-critical operations

More information

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013 Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache

More information

Get More Scalability and Flexibility for Big Data

Get More Scalability and Flexibility for Big Data Solution Overview LexisNexis High-Performance Computing Cluster Systems Platform Get More Scalability and Flexibility for What You Will Learn Modern enterprises are challenged with the need to store and

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Ubuntu and Hadoop: the perfect match

Ubuntu and Hadoop: the perfect match WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

The Production Cloud

The Production Cloud The Production Cloud The cloud is not just for backup storage, development projects and other low-risk applications. In this document, we look at the characteristics of a public cloud environment that

More information

I/O Considerations in Big Data Analytics

I/O Considerations in Big Data Analytics Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

White Paper: What You Need To Know About Hadoop

White Paper: What You Need To Know About Hadoop CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief DDN Solution Brief Personal Storage for the Enterprise WOS Cloud Secure, Shared Drop-in File Access for Enterprise Users, Anytime and Anywhere 2011 DataDirect Networks. All Rights Reserved DDN WOS Cloud

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

HealthCare Anytime. As we approach the 2020s, the trend toward big data, tools, and systemization

HealthCare Anytime. As we approach the 2020s, the trend toward big data, tools, and systemization Datastax Provides with a Strategic Competitive Advantage as They Improve Patients Medical Care Executive Summary For more than 20 years, much of the national debate on reforming health care has focused

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

How To Use Hp Vertica Ondemand

How To Use Hp Vertica Ondemand Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater

More information

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner The emergence

More information

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS WHITEPAPER BASHO DATA PLATFORM BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS INTRODUCTION Big Data applications and the Internet of Things (IoT) are changing and often improving our

More information

How To Use Big Data For Telco (For A Telco)

How To Use Big Data For Telco (For A Telco) ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call

More information

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper High Availability with Postgres Plus Advanced Server An EnterpriseDB White Paper For DBAs, Database Architects & IT Directors December 2013 Table of Contents Introduction 3 Active/Passive Clustering 4

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Big data blue print for cloud architecture

Big data blue print for cloud architecture Big data blue print for cloud architecture -COGNIZANT Image Area Prabhu Inbarajan Srinivasan Thiruvengadathan Muralicharan Gurumoorthy Praveen Codur 2012, Cognizant Next 30 minutes Big Data / Cloud challenges

More information

No-SQL Databases for High Volume Data

No-SQL Databases for High Volume Data Target Conference 2014 No-SQL Databases for High Volume Data Edward Wijnen 3 November 2014 The New Connected World Needs a Revolutionary New DBMS Today The Internet of Things 1990 s Mobile 1970 s Mainfram

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

T a c k l i ng Big Data w i th High-Performance

T a c k l i ng Big Data w i th High-Performance Worldwide Headquarters: 211 North Union Street, Suite 105, Alexandria, VA 22314, USA P.571.296.8060 F.508.988.7881 www.idc-gi.com T a c k l i ng Big Data w i th High-Performance Computing W H I T E P A

More information

DBA'S GUIDE TO NOSQL APACHE CASSANDRA

DBA'S GUIDE TO NOSQL APACHE CASSANDRA DBA'S GUIDE TO NOSQL APACHE CASSANDRA THE ENLIGHTENED DBA Smashwords Edition Copyright 2014 The Enlightened DBA This ebook is licensed for your personal enjoyment only. This ebook may not be re-sold or

More information

How To Make Data Streaming A Real Time Intelligence

How To Make Data Streaming A Real Time Intelligence REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

Delivering Real-World Total Cost of Ownership and Operational Benefits

Delivering Real-World Total Cost of Ownership and Operational Benefits Delivering Real-World Total Cost of Ownership and Operational Benefits Treasure Data - Delivering Real-World Total Cost of Ownership and Operational Benefits 1 Background Big Data is traditionally thought

More information

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics

More information

Apache Hadoop: Past, Present, and Future

Apache Hadoop: Past, Present, and Future The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information