A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud



Similar documents
Hadoop Architecture. Part 1

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Chapter 7. Using Hadoop Cluster and MapReduce

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

MapReduce, Hadoop and Amazon AWS

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Open source Google-style large scale data analysis with Hadoop

THE HADOOP DISTRIBUTED FILE SYSTEM

Apache Hadoop new way for the company to store and analyze big data

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

TP1: Getting Started with Hadoop

Hadoop implementation of MapReduce computational model. Ján Vaňo

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Apache Hadoop. Alexandru Costan

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

Fundamentals Curriculum HAWQ

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

International Journal of Advance Research in Computer Science and Management Studies

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

How to properly misuse Hadoop. Marcel Huntemann NERSC tutorial session 2/12/13

Hadoop Parallel Data Processing

Large scale processing using Hadoop. Ján Vaňo

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Big Data With Hadoop

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

HADOOP MOCK TEST HADOOP MOCK TEST I

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Hadoop IST 734 SS CHUNG

Big Data Introduction

CSE-E5430 Scalable Cloud Computing Lecture 2

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

HadoopRDF : A Scalable RDF Data Analysis System

NoSQL and Hadoop Technologies On Oracle Cloud

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Data-Intensive Computing with Map-Reduce and Hadoop

Data Analytics. CloudSuite1.0 Benchmark Suite Copyright (c) 2011, Parallel Systems Architecture Lab, EPFL. All rights reserved.

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Big Data Analytics. Lucas Rego Drumond

Apache Hadoop FileSystem and its Usage in Facebook

Performance and Energy Efficiency of. Hadoop deployment models

Big Data Analysis and Its Scheduling Policy Hadoop

MapReduce. Tushar B. Kute,

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

Certified Big Data and Apache Hadoop Developer VS-1221

How To Use Hadoop

IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

Accelerating and Simplifying Apache

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science


Open source large scale distributed data management with Google s MapReduce and Bigtable

Parallel Processing of cluster by Map Reduce

Optimize the execution of local physics analysis workflows using Hadoop

Hadoop & its Usage at Facebook

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Generic Log Analyzer Using Hadoop Mapreduce Framework

HDFS Users Guide. Table of contents

Scalable Multiple NameNodes Hadoop Cloud Storage System

COURSE CONTENT Big Data and Hadoop Training


The Performance Characteristics of MapReduce Applications on Scalable Clusters

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Hadoop and its Usage at Facebook. Dhruba Borthakur June 22 rd, 2009

Intro to Map/Reduce a.k.a. Hadoop

Hadoop & its Usage at Facebook

Dell Reference Configuration for Hortonworks Data Platform

A very short Intro to Hadoop

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Introduction to MapReduce and Hadoop

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Mobile Cloud Computing for Data-Intensive Applications

Big Application Execution on Cloud using Hadoop Distributed File System

Performance Evaluation for BlobSeer and Hadoop using Machine Learning Algorithms

Hadoop Technology HADOOP CLUSTER

GraySort and MinuteSort at Yahoo on Hadoop 0.23

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

Transcription:

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud Thuy D. Nguyen, Cynthia E. Irvine, Jean Khosalim Department of Computer Science Ground System Architectures Workshop March 18-21, 2013 Published by The Aerospace Corporation with permission. 1

Introduction Motivation - Develop a cross-domain MapReduce framework for a multilevel secure (MLS) cloud, allowing users to analyze data at different security classifications Topics - Apache Hadoop framework - MLS-aware Hadoop Distributed File System Concept of operations Requirements, design, implementation - Future work and conclusion 2

Open source software framework for reliable, scalable, distributed computing Inspired by Google s MapReduce computational paradigm and Google File System (GFS) Two main subprojects: - Hadoop Distributed File System, Hadoop MapReduce Support distributed computing on massive data sets on clusters of commodity computers Common usage patterns - ETL (Extract Transform Load) replacement - Data analytics, machine learning Apache Hadoop - Parallel processing platforms (Map without Reduce) 3

Hadoop Architecture Data Node c HDFS block access 2 HDFS metadata access 1 Client 3 Intra-HDFS operations Name Node HDFS access a MapReduce Job Submission Data Node d Intra-HDFS operations Job Tracker Secondary NameNode b 1 y HDFS access Client-to-Server communications Internal server-to-server communications Task Tracker Intra-MR Engine operations Task Tracker 4

MLS-Aware: A Definition A component is considered MLS-aware if it executes without privileges in an MLS environment, and yet takes advantage of that environment to provide useful functionality. Examples: - Reading from resources labeled at the same or lower security levels - Making access decisions based on the security level of the data - Returning the security level of the data 5

Objective Objective and Approach - Extend Hadoop to provide a cross-domain read-down capability without requiring the Hadoop server components to be trustworthy Approach - Modify Hadoop to run on a trusted platform that enforces an MLS policy on local file system Use Security Enhanced Linux (SELinux) for initial prototype - Modify HDFS to be MLS-aware Multiple single-level HDFS instances each is cognizant of HDFS namespaces at lower security levels HDFS servers running at a security level can access file objects at lower levels as permitted by underlying trusted computing base (TCB) - No trusted processes outside TCB boundary 6

User session level - Implicitly established by security level of receiving network interface and TCP/IP ports File access policy rules - A user can read and write file objects at user s session level - A user can read file objects if the user s session level dominates the level of the requested object File system abstraction - HDFS interface is similar to UNIX file system - Traditional Hadoop cluster: one file system HDFS Concept of Operation - MLS-enhanced cluster: multiple file systems, one per security level 7

Root directory at a particular level is expressed as /<user-defined security-level-indicator> HDFS File Organization Security-level-indicator is administratively assigned to an SELinux sensitivity level Traditional root directory (/) is root at the user s session level 8

MLS-aware Hadoop Design Multiple single-level HDFS server instances co-locate on same physical node All NameNode instances run on same physical node DataNode instances are distributed across multiple physical nodes - Authoritative DataNode instance: owner of local files used to store HDFS blocks - Surrogate DataNode instance: handles read-down requests on behalf of an authoritative DataNode instance running at a lower level Configuration file defines allocation of authoritative and surrogate DataNode instances on different physical nodes Design does not impact MapReduce subsystem - JobTracker and TaskTracker only interact with NameNode and DataNode as HDFS clients 9

MLS-enhanced Hadoop Cluster Process Physical node 10

Client running at user s session level - Contact NameNode at same level to request a file at a lower level NameNode instance at session level - Obtain metadata of requested file and storage locations of associated blocks from NameNode instance running at lower security level - Direct client to contact surrogate DataNode instances that colocate with the file s primary DataNode instances Surrogate DataNode instance at session level - Look up locations of local files used to store requested blocks - Read local files and return requested blocks Cross-domain Read-down Security level of local files is lower than session level 11

Read-down Example 12

Source Lines of Code (SLOC) Metric Use open source Count Lines of Code (CLOC) tool - Can calculate differences in blank, comment, and source lines Summary of code modification - Delta value is the sum of addition, removal, and modification of source lines - Overall change is less than 5% 13

Future work - Adding read-down support to HDFS Federation - Implementing an external Cache Manager - Investigating Hadoop s use of Kerberos for establishing user sessions at different security levels - Performing benchmark testing with larger datasets Prototype is the first step towards developing a highly secure MapReduce platform Does not introduce any trusted processes outside the pre-existing TCB boundary Only affects HDFS servers Future Work and Conclusions 14

Contact Cynthia E. Irvine, PhD, irvine@nps.edu Thuy D. Nguyen, tdnguyen@nps.edu Center for Information Systems Security Studies and Research Department of Computer Science Naval Postgraduate School, Monterey, CA 93943 U.S.A http://www.cisr.us 15