INTRODUCTION TO CASSANDRA



Similar documents
Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Introduction to Apache Cassandra

So What s the Big Deal?

How To Handle Big Data With A Data Scientist

How To Scale Out Of A Nosql Database

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

No-SQL Databases for High Volume Data

CitusDB Architecture for Real-Time Big Data

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Preparing Your Data For Cloud

GigaSpaces Real-Time Analytics for Big Data

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

NoSQL Database Options

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

An Approach to Implement Map Reduce with NoSQL Databases

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

NoSQL Data Base Basics

Using Cloud Services for Test Environments A case study of the use of Amazon EC2

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Big Data Technologies Compared June 2014

Big Data: Beyond the Hype

NoSQL for SQL Professionals William McKnight

Data Modeling for Big Data

How To Use Big Data For Telco (For A Telco)

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Navigating the Big Data infrastructure layer Helena Schwenk

Big Data: Beyond the Hype. Why Big Data Matters to You. White Paper

Big Systems, Big Data

Big Data: Beyond the Hype

Hadoop in the Hybrid Cloud

never 20X spike ClustrixDB 2nd Choxi (Formally nomorerack.com) Customer Success Story Reliability and Availability with fast growth in the cloud

Large scale processing using Hadoop. Ján Vaňo

The 3 questions to ask yourself about BIG DATA

Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

From Spark to Ignition:

2.1.5 Storing your application s structured data in a cloud database

Big Data Processing: Past, Present and Future

Lecture Data Warehouse Systems

The Quest for Extreme Scalability

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Cloud Computing and Advanced Relationship Analytics

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

Protecting Big Data Data Protection Solutions for the Business Data Lake

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

Cloud Scale Distributed Data Storage. Jürmo Mehine

Open Source Technologies on Microsoft Azure

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

The Inside Scoop on Hadoop

Hadoop. Sunday, November 25, 12

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER

THE REALITIES OF NOSQL BACKUPS

BIG DATA TRENDS AND TECHNOLOGIES

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS)

Information Architecture

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

NoSQL and Hadoop Technologies On Oracle Cloud

A Distributed Storage Schema for Cloud Computing based Raster GIS Systems. Presented by Cao Kang, Ph.D. Geography Department, Clark University

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Understanding NoSQL Technologies on Windows Azure

How to Choose Between Hadoop, NoSQL and RDBMS

Introduction to Multi-Data Center Operations with Apache Cassandra, Hadoop, and Solr WHITE PAPER

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015

Performance Testing of Big Data Applications

TOP 8 TRENDS FOR 2016 BIG DATA

InfiniteGraph: The Distributed Graph Database

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Introduction to NOSQL

A survey of big data architectures for handling massive data

Integrating Big Data into the Computing Curricula

Microsoft Azure Data Technologies: An Overview

Student Project 1 - Explorative Data Analysis with Hadoop and Spark

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Databases 2 (VU) ( )

Slave. Master. Research Scholar, Bharathiar University

The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success

Hadoop implementation of MapReduce computational model. Ján Vaňo

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Understanding NoSQL on Microsoft Azure

Transcription:

INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications.

WHAT IS CASSANDRA? Apache Cassandra is a high performance, open source database designed to handle very large amounts of data. It s an economical solution that provides a distributed architecture for continuous availability and no single point of failure, and offers superior reliability and scalability. Cassandra is a NoSQL database management system (DBMS), or a database that doesn t require that the data be in the tabular format required of traditional relational database management systems (RDBMS) and doesn t need to use standard SQL for data storage and retrieval. NoSQL databases offer a high level of flexibility, so IT departments are always ready to deal with new types of data as they emerge, regardless of volume or structure. This flexible structure makes them well suited for web and mobile applications, as well as the for Internet of Things (IoT) applications. Other NoSQL databases include HBase, MongoDB, and Couchbase. History of Cassandra Cassandra was developed at Facebook to support its Inbox search capabilities. It was offcially introduced as an open source technology in 2008 as Apache Cassandra, and a couple of years later, redistributed by DataStax as an enterprise offering. For companies that need more advanced capabilities and full product support, Datastax Enterprise extends Cassandra with built-in analytics, search, and security capabilities, along with formal training and product support.

How it s used Cassandra enables data storage across as many commodity servers as required in order to distribute workload and ensure performance levels that could not generally be guaranteed with a traditional RDBMS. This can greatly reduce the hardware costs associated with large server purchases and makes it easier to accommodate highvolume data writes. This distributed architecture ensures that there is no single point of failure as data is replicated across the Cassandra nodes, or instances. In other words, if one of the servers experiences an issue, the database can automatically redistribute the load so that neither access nor performance are impacted. This makes Cassandra an ideal solution for scale-out strategies as data volumes grow. For companies that are making use of cloud computing, Cassandra is an ideal solution. It integrates easily with cloud computing platforms such as Amazon Web Services (AWS), Microsoft Azure, and Google Compute Engine. While Cassandra is the superior platform for a high-volume data writes, for use as a solution for data retrieval and analytics you can pair it with a distributed storage and processing platform such as Spark or Hadoop. The Spark platform is particularly well suited for fast data retrieval because of its in-memory processing. Hadoop is recommended when you need to handle very large data volumes. Who s using Cassandra As Netflix grew in popularity, they needed to eliminate downtimes in their SQL database in order to ensure a positive and consistent user experience. Database issues and frequent schema changes were causing some services to become unavailable for up to 10 minutes at a time, which was not acceptable for the service they aimed to provide to their customers. Netflix now uses Cassandra running on Amazon Web Services (AWS) to store customer and media information such as ratings. This implementation has removed the possibility of any single point of failure, which ensures that they have a highly available service at all times.

ebay deployed Cassandra to track buying history and user activity. Using this information, they can tailor the ebay user experience for each individual customer. With Cassandra, their data store is always available, and the personalized shopping experience is never compromised even with the extremely high volumes of data that the ebay system must process each day. At Spotify, Cassandra manages information on customer playlists and product catalogs, which are being created at very high volumes as a growing number of users access the application. With this data, Spotify can personalize every user s experience and automatically present him/her with music playlists that are tailored to individual preferences. This represents just a small sample of the companies that use Cassandra and DataStax to support some critical aspect of their business. Across these examples, one commonality is the requirement for reliable performance in the face of massive amounts of data and the ability to scale as these demands grow. Practical Applications NoSQL databases can be loosely classified into four main categories: Column, Document, Key Value and Graph. Cassandra is unique among NoSQL solutions because it s a Columnar/Tabular database. This means data is stored in flexible columns and tables instead of rows, as with a RDBMS. With this type of database, storage is much less structured than a RDBMS, but it enables a high level of flexibility and exceptional performance. When considering whether Cassandra may be a good fit for your project, it s beneficial to understand that there are certain scenarios in which Cassandra is especially well suited. You may be considering a Cassandra deployment when you have: Data volumes that are measured in terabytes or petabytes A need to ingest or write the data at very high speed Data that needs to be geographically distributed

A requirement for high fault tolerance A need to reduce database costs At least some data that s unstructured or semi-structured (although not schemaless) A solution that must easily scale as data volumes grow A need for high performance Cassandra is best suited when fast write performance or storage is required -- for example, when collecting metrics or shopping cart data. This data can be analyzed at a later time, but analytics is not the primary application of Cassandra. Challenges and Considerations When an IT department decides to use Cassandra for a particular business application, the initial configuration of the database is extremely important to the overall success of the project. It is important that the data model is designed and implemented appropriately from the outset to ensure maximum scalability and performance. Also, it s critical to monitor the environment once it has been implemented. Because the database will continue to function regardless of node failures, it s necessary to detect and resolve failures as quickly as possible in order to keep the system at peak health. Since Cassandra does not rely on the data structure and query language of a traditional RDBMS, it requires a set of unique technical skills to ensure peak effciency and stability. These skills include in-depth experience in data modelling and administration, and an solid understanding of how to monitor the system. And since it is a relatively new technology, companies who have decided to use it may find it diffcult to attract and maintain administrators with the expertise required to manage this essential IT resource.

Achieving Success with Cassandra In order to best take advantage of the many benefits of deploying Cassandra and derive the most value in the shortest possible timeframe, it s a good idea to partner with someone who can provide highly knowledgeable and experienced Cassandra database experts. You need professionals who can step in and work closely with your team at any stage of your project. This approach allows you to address critical skills and capacity gaps within the team and get your Cassandra instance up and running quickly, and performing optimally as you move forward. Here are some key milestones of your implementation for which you may consider engaging specialized assistance: NoSQL education and technology selection Initial assessments against an existing Cassandra environment if it exists Cassandra installation and configuration Cassandra database modelling and development Operational support for your Cassandra environment If you re thinking of implementing Cassandra, watch our webinar, Getting Started With Cassandra, which covers key topics for starting out, such as when you should use Cassandra, potential challenges, real world Cassandra applications and benefits, and more. If you would like more information, visit www.pythian.com or contact us by email at info@pythian.com