Securing Big Data at Rest with Encryption for Hadoop, Cassandra and MongoDB on Red Hat.



Similar documents
CDH 5 Quick Start Guide

Apache Hadoop new way for the company to store and analyze big data

Single Node Hadoop Cluster Setup

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Hadoop Internals for Oracle Developers and DBAs Exploring the HDFS and MapReduce Data Flow

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Distributed Filesystems

Fast, Low-Overhead Encryption for Apache Hadoop*

Implement Hadoop jobs to extract business value from large and varied data sets

BIG DATA TOOLS. Top 10 open source technologies for Big Data

MapReduce and Hadoop Distributed File System

Big Data Too Big To Ignore

Ubuntu and Hadoop: the perfect match

The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.

Introduction to Cloud Computing

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

NoSQL Data Base Basics

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech

Обработка больших данных: Map Reduce (Python) + Hadoop (Streaming) Максим Щербаков ВолгГТУ 8/10/2014

Open Source Technologies on Microsoft Azure

Big Data and Data Science: Behind the Buzz Words

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Hadoop Tutorial Group 7 - Tools For Big Data Indian Institute of Technology Bombay

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters

Cloud3DView: Gamifying Data Center Management

Partek Flow Installation Guide

Accelerating and Simplifying Apache

Introduction to HDFS. Prasanth Kothuri, CERN

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER

How To Handle Big Data With A Data Scientist

Big Data Introduction

Introduction to HDFS. Prasanth Kothuri, CERN

MapReduce with Apache Hadoop Analysing Big Data

A Study of Data Management Technology for Handling Big Data

Cloudera ODBC Driver for Apache Hive Version

Performance Evaluation for BlobSeer and Hadoop using Machine Learning Algorithms

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Big Data Encryption for Privacy and Compliance

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

How To Install Hadoop From Apa Hadoop To (Hadoop)

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

On Disk Encryption with Red Hat Enterprise Linux

Hadoop Architecture. Part 1

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

Keyword: YARN, HDFS, RAM

Big Data Explained. An introduction to Big Data Science.

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Chapter 7. Using Hadoop Cluster and MapReduce

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

CLOUD API DOCUMENTATION v2.0. Get list of cloud servers in account

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015

Important Notice. (c) Cloudera, Inc. All rights reserved.

Olivier Renault Solu/on Engineer Hortonworks. Hadoop Security

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Big Data Analytics - Accelerated. stream-horizon.com

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Lecture 10 - Functional programming: Hadoop and MapReduce

INTRODUCTION TO CASSANDRA

Spectrum Scale HDFS Transparency Guide

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

BIG DATA USING HADOOP

Big Data Analytics. Lucas Rego Drumond

HDInsight Essentials. Rajesh Nadipalli. Chapter No. 1 "Hadoop and HDInsight in a Heartbeat"

! E6893 Big Data Analytics:! Demo Session II: Mahout working with Eclipse and Maven for Collaborative Filtering

[Hadoop, Storm and Couchbase: Faster Big Data]

High Performance Data Management Use of Standards in Commercial Product Development

MapReduce, Hadoop and Amazon AWS

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud

Enabling High performance Big Data platform with RDMA

NAS Storage needs to be purchased; Will not be offered IAAS - Utility SMTP Per SMTP account Per server

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

HDFS. Hadoop Distributed File System

CloudPassage Halo Technical Overview

White Paper: Hadoop for Intelligence Analysis

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Hadoop Big Data for Processing Data and Performing Workload

Hadoop and Map-Reduce. Swati Gore

Yahoo! Grid Services Where Grid Computing at Yahoo! is Today

Experiences with Lustre* and Hadoop*

XGenPlus Installation Guide

A Brief Outline on Bigdata Hadoop

Hadoop Basics with InfoSphere BigInsights

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS)

Katta & Hadoop. Katta - Distributed Lucene Index in Production. Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com

Transcription:

Securing Big Data at Rest with Encryption for Hadoop, Cassandra and MongoDB on Red Hat. Alex Gonzalez Software Engineer 1

Content What is Big Data 3 important No-SQL players Who uses Big Data Use Cases Encryption Solutions and its demo Navigator Encrypt Performance MongoDB, Hadoop and Cassandra Encryption Conclusion 2

What is BigData? Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. 3

3 important NoSQL players Is a framework that allows for the distributed processing of large data sets across clusters of computers. A database with high availability, linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure. A scalable and highperformance, high availability, and easy scalability open source database designed to handle document-oriented storage. 4

5

6

7

Application areas Business Intelligence, Analytics & Performance Mgmt Advertising, Sales & Marketing Advertising Network or Exchange Monitoring and Security Social Education and Training Data and Document Management - Financial, Health, etc. Music Video Gaming 8

Open source encryption solutions dm-crypt A transparent disk encryption subsystem ecryptfs ecryptfs is a POSIX-compliant enterprise cryptographic stacked filesystem for Linux. Both are supported at Ubuntu, SLES, RedHat, Debian and CentOS. Red Hat 7.x and CentOS 7.x are not supporting ecryptfs anymore. 9

ecryptfs and dmcrypt-demo 10

Cloudera Navigator Encrypt Provides massively scalable, high-performance encryption for sensitive data. It leverages industry-standard AES-256 encryption and provides a transparent layer between the application and filesystem. 11

Navigator Encrypt Performance Performance cost is ~5% to ~10% { nthreads: 32, filesizemb: 1000, r: true } new thread, total running : 1 Not-encrypted: 2380 ops/sec 9 MB/sec Performance cost: 4.15% Encrypted: 2479 ops/sec 9 MB/sec new thread, total running : 2 Not-encrypted: 3011 ops/sec 11 MB/sec Performance cost: 4.94% Encrypted: 3160 ops/sec 11 MB/sec 12

MongoDB Demo 13

Navigator Encrypt Profiles DMCRYPT Navigator Encrypt works differently when creating ACLs for Java processes because the binary executed is the Java executable and Java can receive different jars. In that case, you need to specify a profile, which contains all the options that Java receives when it gets executed. Using that profile, you can set which java application will access the data. 14

Hadoop Encryption Navigator Encrypt Profiling - Obtaining the PID [root@hdfs-2 ~]# ps aux grep datanode hdfs 7910 0.5 3.3 1649284 257040? Sl 11:41 0:25 /usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_datanode -Xmx1000m -Dhdfs.audit. logger=info,rfaaudit -Dsecurity.audit.logger=INFO,RFAS -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log. file=hadoop-cmf-hdfs-1-datanode-hdfs-2.vpc.cloudera.com.log.out -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop/lib/native Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Xms1073741824 -Xmx1073741824 -XX:+UseParNewGC -XX: +UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX: OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.datanode 15

Hadoop Encryption Navigator Encrypt Profiling [root@hdfs-2 ~]# navencrypt-profile -p 7910 { "uid":"496", "comm":"java", "cmdline":"/usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_datanode -Xmx1000m -Dhdfs.audit.logger=INFO,RFAAUDIT -Dsecurity.audit.logger=INFO,RFAS -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-cmf-HDFS-1-DATANODE-hdfs-2.vpc.cloudera.com.log. out -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava. library.path=/opt/cloudera/parcels/cdh-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net. preferipv4stack=true -server -Xms1073741824 -Xmx1073741824 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX: CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -Dhadoop. security.logger=info,rfas org.apache.hadoop.hdfs.server.datanode.datanode" } [root@hdfs-2 ~]# navencrypt-profile -p 7910 > profile.txt 16

Hadoop Encryption Adding a Navigator Encrypt ACL [root@hdfs-2 ~]# navencrypt acl --add --rule="allow @hdfs * /usr/java/jdk1.7.0_67cloudera/bin/java" --profile=profile.txt Type MASTER passphrase: 1 rule(s) were added 17

Hadoop Encryption Verify Navigator Encrypt ACL Addition [root@hdfs-2 ~]# navencrypt acl --list --all Type MASTER passphrase: # 1 Type Category ALLOW @hdfs Path Profile Process * YES /usr/java/jdk1.7.0_67-cloudera/bin/java PROFILE: {"uid":"496","cmdline":"/usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_datanode -Xmx1000m -Dhdfs.audit.logger=INFO,RFAAUDIT -Dsecurity.audit. logger=info,rfas -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-cmf-HDFS-1-DATANODE-hdfs-2.vpc. cloudera.com.log.out -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root. logger=info,rfa -Djava.library.path=/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Xms1073741824 -Xmx1073741824 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX: CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -Dhadoop. security.logger=info,rfas org.apache.hadoop.hdfs.server.datanode.datanode","comm":"java"} 18

Hadoop Encryption Stopping the cluster 19

Hadoop Encryption Stopping the cluster 20

Hadoop Encryption Navigator Encrypt Data Encryption root@hdfs-2 ~]# navencrypt-move encrypt @hdfs /data/dfs/dn/current/ /mnt/mountpoint/ Type MASTER passphrase: Size to encrypt: 12 KB Moving from: '/data/dfs/dn/current' Moving to: '/mnt/mountpoint/hdfs/data/dfs/dn/current' 100% [=======================================================>] [ 345 B] Done. 21

Hadoop Encryption Starting the Cluster 22

Hadoop Encryption HDFS Test [root@hdfs-2 ~]# su - hdfs [hdfs@hdfs-2 ~]$ touch file.txt [hdfs@hdfs-2 ~]$ hdfs dfs -mkdir /data/ [hdfs@hdfs-2 ~]$ hdfs dfs -copyfromlocal file.txt /data/file.txt [hdfs@hdfs-2 ~]$ hdfs dfs -ls /data/ Found 1 items -rw-r--r-- 2 hdfs supergroup 0 2015-05-20 13:50 /data/file.txt 23

Cassandra Encryption # ps aux grep cassandra root 15109 22.4 27.0 6347932 4143708 pts/0 SLl 00:22 0:08 java -ea -javaagent: /apache-... # navencrypt-profile --pid=15109 > cassandra.profile # navencrypt acl --add --rule="allow @cassandra * /usr/lib/jvm/java-6oracle/jre/bin/java" --profile=cassandra.profile # navencrypt-move encrypt @cassandra /var/lib/cassandra/ /mnt/encrypted-mountpoint 24

Thank you alex.gonzalez@cloudera.com 25