Roadmap for Applying Hadoop Distributed File System in Scientific Grid Computing



Similar documents
HADOOP, a newly emerged Java-based software framework, Hadoop Distributed File System for the Grid

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week

Optimize the execution of local physics analysis workflows using Hadoop

Hadoop Architecture. Part 1

Design and Evolution of the Apache Hadoop File System(HDFS)

THE HADOOP DISTRIBUTED FILE SYSTEM

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

The Hadoop Distributed File System

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Hadoop: Embracing future hardware

HDFS Users Guide. Table of contents

Detailed Outline of Hadoop. Brian Bockelman

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

OSG Hadoop is packaged into rpms for SL4, SL5 by Caltech BeStMan, gridftp backend

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

An objective comparison test of workload management systems

Distributed File Systems

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Apache Hadoop. Alexandru Costan

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Hadoop Cluster Applications

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Analisi di un servizio SRM: StoRM

Hadoop & its Usage at Facebook

Chapter 7. Using Hadoop Cluster and MapReduce

Big Data With Hadoop

Hadoop & its Usage at Facebook

Enabling High performance Big Data platform with RDMA

HDFS Architecture Guide

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

GraySort and MinuteSort at Yahoo on Hadoop 0.23

HADOOP MOCK TEST HADOOP MOCK TEST I

The Hadoop Distributed File System

NoSQL and Hadoop Technologies On Oracle Cloud

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Certified Big Data and Apache Hadoop Developer VS-1221

Hadoop IST 734 SS CHUNG

Big data management with IBM General Parallel File System

CDH AND BUSINESS CONTINUITY:


GraySort on Apache Spark by Databricks

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

International Journal of Advance Research in Computer Science and Management Studies

Protect Microsoft Exchange databases, achieve long-term data retention

Benchmarking Hadoop & HBase on Violin

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

HDFS: Hadoop Distributed File System

CSE-E5430 Scalable Cloud Computing Lecture 2

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Apache Hadoop FileSystem and its Usage in Facebook

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Large scale processing using Hadoop. Ján Vaňo

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

Big Data Technology Core Hadoop: HDFS-YARN Internals

HadoopTM Analytics DDN

PARALLELS CLOUD STORAGE

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud

IMPLEMENTING GREEN IT

Scala Storage Scale-Out Clustered Storage White Paper

Hadoop implementation of MapReduce computational model. Ján Vaňo

Generic Log Analyzer Using Hadoop Mapreduce Framework

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Amazon EC2 Product Details Page 1 of 5

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

High Availability on MapR

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Parallels Cloud Storage

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Cluster, Grid, Cloud Concepts

Distributed File Systems

Diagram 1: Islands of storage across a digital broadcast workflow

Apache Hadoop Cluster Configuration Guide

Big Data Analytics - Accelerated. stream-horizon.com

BIG DATA TRENDS AND TECHNOLOGIES

Apache Hadoop new way for the company to store and analyze big data

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Assignment # 1 (Cloud Computing Security)

Transcription:

Roadmap for Applying Hadoop Distributed File System in Scientific Grid Computing Garhan Attebury 1, Andrew Baranovski 2, Ken Bloom 1, Brian Bockelman 1, Dorian Kcira 3, James Letts 4, Tanya Levshina 2, Carl Lundestedt 1, Terrence Martin 4, Will Maier 5, Haifeng Pi 4, Abhishek Rana 4, Igor Sfiligoi 4, Alexander Sim 6, Michael Thomas 3, Frank Wuerthwein 4 1. University of Nebraska Lincoln 2. Fermi National Accelerator Laboratory 3. California Institute of Technology 4. University of California, San Diego 5. University of Wisconsin Madson 6. Lawrence Berkeley National Laboratory On Behalf of Open Science Grid (OSG) Storage Hadoop Community 1

Storage, a critical component of Grid Grid computing is data-intensive and CPU-intensive, which requires Scalable management system for bookkeeping and discovering data Reliable and fast tools for distributing and replicating data Efficient procedures for processing and extracting data Advanced techniques for analyzing and storing data in parallel A scalable, dynamic, efficient and easy-to-maintain storage system is on the critical path to the success of grid computing Meet various data access needs in both organization and individual level Maximize the CPU usage and efficiency Fit into sophisticated VO policies (e.g. Data security, user privilege ) Survive the unexpected usage of storage system Minimize the cost of ownership Easy to expand, reconfigure, commission/decommission as requirement changes 2

A Case Study, Some Requirements for Storage Element (SE) at Compact Muon Solenoid (CMS) Have a credible support model that meets the reliability, availability, and security expectations consistent with the computing infrastructure Demonstrate the ability to interface with the existing global data transfer system and the transfer technology of SRM tools and FTS as well as demonstrate the ability to interface to the CMS software locally through ROOT Well-defined and reliable behavior for recovery from the failure of any hardware components. Well-defined and reliable method of replicating files to protect against the loss of any individual hardware system Well-defined and reliable procedure for decommissioning hardware without data loss Well-defined and reliable procedure for site operators to regularly check the integrity of all files in the SE Well-defined interfaces to monitoring systems Capable of delivering at least 1 MB/s/batch slot for CMS applications, capable of writing files from the WAN at a performance of at least 125MB/s while simultaneously writing data from the local farm at an average rate of 20MB/s. Failures of jobs due to failure to open the file or deliver the data products from the storage systems should be at the level of less than 1 in 10 5 level. 3

Hadoop Distributed File System (HDFS) Open source project hosted by Apache (http://hadoop.apache.org) and used by YAHOO for its search engine with multiple-pb scale of data involved Design goal reduce the impact of hardware failure Stream data access handle large datasets Simple coherency model Portability across heterogeneous platforms A scalable distributed cluster file system The namespace and image of the whole file system is maintained in one single machine's memory, NameNode The files are split into blocks and stored across the cluster, DataNode File blocks can be replicated. Loss of one DataNode can be recovered from the replica blocks in other DataNodes. 4

Important Components of HDFS-based SE Fuse/Fuse-DFS A linux kernel module, allows file systems to be written in userspace and POSIXlike interface to HDFS Important for the software application accessing data in the local SE Globus GridFTP provide WAN transfer between to SE(s) or SE and workernode (WN). A special plugin is needed to assemble asynchronous transfered packets for sequential writing to the HDFS if multiple streams are used BeStMan provide SRM interface to the HDFS Possible to develop/implement plugins to select GridFTP servers according to the status of the GridFTP servers A number of software bugs and integration issues have been solved for the last 12 months to really bring all the components together and make a production quality SE 5

HDFS SE Architecture for Scientific Computing NameNode (secondary NN) BeStMan Fuse + Hadoop Client Dedicated Data Node Hadoop Client WorkerNode + (DataNode) + (GridFTP) FUSE + Hadoop Client WorkerNode + (DataNode) + (GridFTP) FUSE + Hadoop Client Dedicated Data Node Hadoop Client WorkerNode + (DataNode) + (GridFTP) FUSE + Hadoop Client WorkerNode + (DataNode) + (GridFTP) FUSE + Hadoop Client GUMS Proxy-User Mapping GridFTP Node FUSE + Hadoop Client GridFTP Node FUSE + Hadoop Client 6

HDFS-based SE at CMS Tier-2 Currently three CMS Tier-2 sites, Nebraska, Caltech and UCSD, deployed HDFS-based SE Average 6-12 months operation experience with increasing scale in total disk space Currently around 100 DataNodes ranging from 300 to 500 TB for each site Successfully serve the CMS collaboration with up to thousands of grid users and hundreds of local users to access the dataset in HDFS Successfully serve the data operation and Monte Carlo production run by the CMS What benefits the new SE brings to these sites Reliability: stop loss of files because of a decent file replica schemes run by HDFS Simple deployment: most of the deployment procedure is streamlined with fewer commands done by the administrators Easy operation: stable system, little effort for system/file recovery, less than 30 min for daily operation and user support Proved scalability for supporting a large number of simultaneous Read/Write operation and high throughput for serving the data for grid jobs running at the site 7

Highlight of Operational Performance of HDFS-SE Stably deliver ~3MB/s to applications in the cluster while the cluster is fully loaded with jobs Sufficient for CMS application's requirement on I/O with high CPU efficiency CMS application is IOPS limited, not bandwidth limited HDFS NameNode serves 2500 user request per second Sufficient for a cluster with thousand of cores with I/O intensive jobs Sustained WAN transfer rate 400MB/s Sufficient for CMS Tier-2 data operation (dataset transfer and stage-out of user analysis jobs) Simultaneously processing thousand client's request at BeStMan Sustained endpoint processing rate 50 Hz Sufficient for high-rate transfers of gigabytes-sized files and uncontrolled chaotic user jobs Observed extremely low file corruption rate Benefit from robust and fast file replication of HDFS Decommissioning of a DataNode < 1 hour, restart NameNode in 1 minute, check the image of file system (from memory of NameNode) in 10 sec Fast and efficient for the operation Survive various stress test that involves HDFS, BeStMan, GridFTP... 8

Data Transfer to HDFS-SE 9

NameNode Operation Count 10

Processing Rate at SRM endpoint 11

Monitoring and Routine Test Integration with general grid monitoring infrastructure Nagious, Ganglia, MonALISA CPU, memory, network statistics for the NameNode, DataNode and the whole system HDFS monitoring Hadoop web service, Hadoop Chronicle, Jconsole Status of the file system and user Logs of NameNode, DataNode and GridFTP, BeStMan As part of the daily tasks and debugging activities Regular low-stress test performed by CMS VO Test analysis jobs, load test of file transfer Part of the daily commission of the site involves local and remote I/O of the SE Intentional failure in various parts of the SE with demonstrated recovery mechanism Documentation of recovery procedure 12

Load test between two HDFS-SE 13

Data Security and Integrity Security concerns HDFS No encryption or strong authentication between client and server. HDFS must only be exposed to a secure internal network Practically firewall or NAT is needed to properly isolated the HDFS from direct public access Latest HDFS implements access token. Transition to kerberos-based components is expected in 2010. Grid components (GridFTP and BeStMan) Use standard GSI security with VOMS extensions Data integrity and consistency of the file system HDFS Checksum for blocks of data Command line tool to check block, directory and file HDFS keeps multiple journal and file system image NameNode periodically requests the entire block report from all DataNode. 14

A Combined Release Infrastructure at OSG and CMS Various original open sources provide all the necessary packages HDFS, FUSE, BeStMan, GridFTP plugins, BeStMan plugins... All software components needed for deploying the hadoop-based SE are packaged as RPM with add-on configuration and scripts necessarily to a site to install with minimal changes according to the site condition and requirement Consistency check and validation are done in selected sites with HDFS-SE experts before the formal release via OSG a testbed for common platforms and scalability test Development in 2010 Release procedure to be fully integrated into standard OSG distribution: Virtual Data Toolkit (VDT) Possibility of some intersection with external commercial packagers, e.g., using selected RPMs from Cloudera 15

Site Specific Optimization Various optimization can be done for each site based on the usage patterns and local hardware condition Block size for files Number of file replicas Architecture of GridFTP server deployment A few high performance GridFTP vs. many GridFTP running at the WorkerNode Memory allocation at WorkerNode (WN) for GridFTP, application... Selection of GridFTP servers Real-time-monitoring-based GridFTP selection base on CPU and memory usage vs. randomly picking alive GridFTP Data access with MapReduce A special case for data processing Rack awareness 16

Summary of Our Experience Hadoop-based storage solution is established and functioning at CMS tier-2 level as an example of data- and CPU-intensive HPC Flexible in the architecture involving various grid components Scalable and stable Seamlessly interfaced with various grid middleware Lower costs in deployment, maintenance, and required hardware Significantly reduce manpower and increase QoS Easy to adapt to existing/new hardware and changing requirements Standard release for the whole community Experts available to help solve the technical problems VO and grid sites benefit from reliable HDFS file replica and distribution scheme High data security and integrity Excellent I/O performance for CPU- and data-intensive grid applications Less administrator intervention HFDS is shown to be seamlessly integrated into a grid storage solution for a Virtual Organization (VO) or grid site 17

Roadmap for the Near Future Deployment in a varieties of scientific computing projects, or experiments, or institutions As a integrated storage element solution As a storage file system Benchmark performance for HPC with data- and CPU-intensive grid computing Scalability, Stability, Usability Integration and efficiency with other tools Organization Seamless integration between scientific user community and HDFS development community Consolidation of scientific release and technical support New development and contribution from scientific community Funding proposal based on HDFS infrastructure and technology Improvement in I/O Capacity and full integration as a critical component of Storage Element Operational optimization with different scales of data and compute infrastructure 18