Enhancing UNICORE Storage Management using Hadoop



Similar documents
UNICORE s UNICORECC Toolbox and SDKs

Anwendungsintegration und Workflows mit UNICORE 6

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Hadoop IST 734 SS CHUNG

BIG DATA TRENDS AND TECHNOLOGIES

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Hadoop Architecture. Part 1

A very short Intro to Hadoop

THE HADOOP DISTRIBUTED FILE SYSTEM

Map Reduce & Hadoop Recommended Text:

Chapter 7. Using Hadoop Cluster and MapReduce

Hadoop Distributed File System (HDFS) Overview

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Open source Google-style large scale data analysis with Hadoop

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics

From Wikipedia, the free encyclopedia

Distributed File Systems An Overview. Nürnberg, Dr. Christian Boehme, GWDG

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

Hadoop Parallel Data Processing

Apache Hadoop FileSystem and its Usage in Facebook

Design and Evolution of the Apache Hadoop File System(HDFS)

Accelerating and Simplifying Apache

Distributed Filesystems

Hadoop Ecosystem B Y R A H I M A.

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

Data processing goes big

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Hadoop. Sunday, November 25, 12

Hadoop & its Usage at Facebook

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Hadoop & its Usage at Facebook

Snapshots in Hadoop Distributed File System

Large scale processing using Hadoop. Ján Vaňo

Case Study : 3 different hadoop cluster deployments

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Big Data Analytics. Lucas Rego Drumond

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

The Inside Scoop on Hadoop

<Insert Picture Here> Big Data

Application Development. A Paradigm Shift

Distributed File Systems

Implementation of Hadoop Distributed File System Protocol on OneFS Tanuj Khurana EMC Isilon Storage Division

File transfer in UNICORE State of the art

Distributed File Systems

Google Bing Daytona Microsoft Research

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Hadoop implementation of MapReduce computational model. Ján Vaňo

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to HDFS. Prasanth Kothuri, CERN

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

L1: Introduction to Hadoop

Open source large scale distributed data management with Google s MapReduce and Bigtable

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Jeffrey D. Ullman slides. MapReduce for data intensive computing

NoSQL Data Base Basics

MapReduce with Apache Hadoop Analysing Big Data

Monitoring of the UNICORE middleware

UFTP High-performance data transfer for UNICORE

CSE-E5430 Scalable Cloud Computing Lecture 2

Introduction to HDFS. Prasanth Kothuri, CERN

Data Mining in the Swamp

Apache Hadoop FileSystem Internals

Apache Hadoop. Alexandru Costan

A Service for Data-Intensive Computations on Virtual Clusters

Bachelorarbeit. Using Cloud Computing Resources in Grid Systems: An Integration of Amazon Web Services into UNICORE 6

Hadoop and its Usage at Facebook. Dhruba Borthakur June 22 rd, 2009

Hadoop: Embracing future hardware

Distributed Storage Management Service in UNICORE

BIG DATA What it is and how to use?

On Enabling Hydrodynamics Data Analysis of Analytical Ultracentrifugation Experiments

Hadoop Distributed File System Propagation Adapter for Nimbus

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

A Brief Outline on Bigdata Hadoop

Like what you hear? Tweet it using: #Sec360

Data-Intensive Computing with Map-Reduce and Hadoop

How To Use Hadoop

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Big Data Explained. An introduction to Big Data Science.

Hadoop and Map-Reduce. Swati Gore

Apache Hadoop: Past, Present, and Future

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Introduction to Cloud Computing

Big Data Analytics. Lucas Rego Drumond

HDFS and Availability Data Retragement

Transcription:

Enhancing UNICORE Storage Management using Hadoop Distributed ib t File System Wasim Bari 2, Ahmed Shiraz Memon 1, Dr. Bernd Schuller 1 1. Jülich Supercomputing Centre, Forschungszentrum Jülich & 2. Institute of Scientific ifi Computing, RWTH Aachen Tuesday, 25 th August, 2009

Contents Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A Motivation Hadoop UNICORE UniHadoop Demonstration Summary 2

Motivation Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A Computation power has increased exponentially More Computation with advanced technologies Grid Computing, Parallel computing, High Performance Computing Environment Modeling High Energy and Nuclear Physics And many more in Feasible time Applications are producing HUGE data Big Bang, Large Hadron Collider at CERN 15 Petabyte annually 3

Motivation Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A Traditional Data Storage Systems cannot store such amount of data Limited Storage Disaster Recovery Number of Files Parallel Access Distributed Data Storage System Elastic expandable Storage Disaster Recovery Fast and Parallel Access Durable Replication Number of Files 4

Motivation Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A UNICORE supports simple Storage mechanisms with the Filesystem located at TargetSystem Site Becomes complicated while using complex workflows Goal: Allows UNICORE to support Distributed Storage System as big shared storage UNICORE Client Distributed Storage System Simple Storage 5

Hadoop Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A In 2003 Proposed a highly scalable, fault tolerant, dynamic Distributed File System called Google File System (GFS). In 2004, introduced d MapReduce programming model, A mechanism to achieve parallelism. Apache Hadoop is an Open Source implementation of GFS and MapReduce Main contributor is Hadoop Distributed File System (HDFS) Scalable Economical Efficient and Reliable 6

Hadoop Distributed Filesystem (HDFS) Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A Very Large Distributed File System 10K nodes, 100 million files, 10 PB Master/Worker Architecture Single Master (NameNode)/Many Workers (DataNode) Single Namespace for entire cluster Blocks: Files are broken up into smaller chunks Typically 128 MB block size Each block replicated on multiple DataNodes Client Finds location of blocks Accesses data directly from DataNode Security Permission i model similar il to POSIX 7

HDFS: Reading a file Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A Courtesy: Hadoop: The Definitive Guide, O Reilly publishers, First Edition, 2009 8

Hadoop Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A Abstract Filesystem concept Supported File- and Storagessystems Hadoop Distributed File System (hdfs://) Cloudera ( formerly Kosmos File System) (kfs://) Amazon S3 File System S3 Native File System (s3n://) S3 Block File System (s3://) Local disk (file://) FTP File System (ftp://) Runtime selection of one of the supported Storagesystemst 9

UNICORE Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A A Ready-to-Run, Open Source Grid Middleware that enables seamless and secure access to the Computing resources UNICORE Atomic Services (UAS) Storage Management age e Service (SMS) S) File Transfer Service (FTS) Job Management Service Implementation of standards OASIS, OGSA, WS-I etc BES, ByteIO, GLUE2, WS-RF etc 10

UNICORE Architecture Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A 11

UniHadoop: Overview Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A UniHadoop is an integration of UNICORE with Apache Hadoop Inspired by Hadoop design Using an Abstract notion of Filesystem Supports all storage/filesystems Specialisation of UNICORE Storage Management Service (SMS). Enable UNICORE clients (UCC, URC etc ) to use Hadoop transparently 12

UniHadoop: Architecture Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A Portal client, e.g. GridSphere command - line client Eclipse - based client Programming API Gateway Hadoop Gateway Hadoop UNICORE Atomic Services XNJS XACML entity SMS FTS IDB OGSA - * UNICORE WS - RF hosting env. CIS Information Service Service Registry UNICORE WS - RF UNICORE Atomic Services XNJS XACML entity SMS FTS IDB OGSA - * UNICORE WS - RF hosting env. UVOS XUUDB Target System Interface Target System Interface Local RMS (e.g. Torque, LL, LSF, etc.) Local RMS (e.g. Torque, LL, LSF, etc.) USpace DSS External Storage USpace 13

UniHadoop: Design Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A StorageManagement: Base Interface List directory create directory change permissions etc... SMSBaseImpl: Basic implementation FixedStorageImpl: A storage service files for fixed path, such as /work SMSHadoopImpl: SMS Adaptation for Hadoop IStorageAdapter: Interface for accessing low level l hierarchical storage systems HadoopStorageAdapter: HDFS specific implementation 14

UniHadoop: Usage Scenario 1 Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A Default Shared Storage Huge workflows Target System 1 Target System 2 Target System 3 Target System N Default Hadoop Shared Storage 15

UniHadoop: Usage Scenario 2 Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A A shared storage for every Target System Target System 1 Target System 2 Target System 3 Target System N Hadoop Storage Hadoop Storage Hadoop Storage Hadoop Storage 16

Demonstration Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A Listing directories/files on HDFS Creating files Submitting jobs using data from HDFS 17

Feature Matrix Contents Motivation Hadoop UNICORE UniHadoop - Summary -Q&A Features Default SMS Hadoop SMS Unlimited Storage Durability Unlimited File Size Disaster Recovery Unlimited it No. Of files Χ Χ Χ Χ Χ Trash Recovery Χ 18

Summary Contents Motivation Hadoop UNICORE UniHadoop - Extensions Summary -Q&A Requirements Architecture and Design Usage scenarios Hadoop integration provide support for MapReduce Data analysis and intelligence using Pig, Hive and HBase A number of available File- and Storagesystems Amazon S3, Cloudera, KFS etc 19

Q & A Contents Motivation Hadoop UNICORE UniHadoop - Summary - Q&A Questions 20

Contact a.memon@fz-juelich.de UNICORE-Hadoop: Bundled in release 6.2.1 server, www.unicore.eu Source: http://unicore.svn.sourceforge.net/viewvc/unicore/unicorex/trunk/uas-hadoop/ 21