The Recovery System for Hadoop Cluster

Transcription

1 The Recovery System for Hadoop Cluster Prof. Priya Deshpande Dept. of Information Technology MIT College of engineering Pune, India Darshan Bora Dept. of Information Technology MIT College of engineering Pune, India Abstract Due to brisk growth of data volume in many organizations, large-scale data processing became a demanding topic for industry as well as for academic fields. Hadoop is widely adopted in Cloud Computing environment for unstructured data. Hadoop is an open source, a java based distributed computing framework, and supports large-scale distributed data processing. In the recent years, Hadoop Distributed File System (HDFS) is popular for huge data sets and streams of operation on it. Availability of Hadoop is the important factor in Cloud Computing. But, in HDFS, Namenode failure affects the performance of the Hadoop cluster. It can be a single point failure. In this paper, we analysed the behaviour of Namenode, what are effects of Namenode failure. This paper presents a scenario to overcome this failure. Our scenario replicates the Namenode on the other Datanode so that the availability of the metadata is increases which will reduce the loss of data as well as delay. is used for storage purpose. In this paper, we focused on HDFS and its failure. As states earlier working of HDFS is based on Namenode and Datanode while the design of HDFS is based on the design of GFS, the Google File System [15][16]. Keywords Hadoop; Cloud Computing; HDFS; Namenode; availability; failure. I. INTRODUCTION Cloud Computing is now mainstream commodity in the IT sector [1]. Accordingly, the impact of hardware failure as well as software failure decreases the performance of the cloud infrastructure. Failure occurrence may have major impact on efficiency of an application or it may cause an application being temporarily out of service. Cloud infrastructure should overcome from these kinds of failures. Now, Hadoop is a cloud workhorse [2]. Many internet companies are dependent on Hadoop for their large datasets. Every day, they are generating large amount of data which is in Terabytes, for example Facebook is generating everyday up to 5 Terabytes data. As a platform of computing and storage, Hadoop on a wider range deals with these kinds of data. Hadoop is a framework of open-source implementation of MapReduce for the analysis of large datasets. To manage storage resources across the cluster, Hadoop uses a distributed user-level file system called as Hadoop Distributed File System (HDFS) [3]. The HDFS is robust and highly scalable. The architectural representation of Hadoop is as shown in the figure 1. Hadoop architecture is based on master-slave architecture. MapReduce Engine and HDFS are the important component of this architecture. JobTracker and TaskTracker are key parts of MapReduce engine while Namenode and Datanode are of HDFS. MapReduce deals with the computation while HDFS Figure 1: Hadoop Architecture in Multi-node Cluster [3][14] HDFS is nothing but a block-structured file system: individual files are broken into blocks of a fixed size called as Chunk; size may be in the order of 4 KB or 8 KB, currently sizes are in MBs. These chunks are stored across a cluster of one or more machines with data storage capacity. Individual machines in the cluster are referred to as Datanodes. A file can be made up of several chunks, and they are stored on the different Datanodes. If several Datanodes must be involved in the serving of a file, then a file could be caused unavailable by the failure of any one of those Datanodes. HDFS combats this problem by replicating each chunk across a number of Datanodes and by default 3 Datanodes are selected for replication. In this scenario, it is important for this file system to store its metadata reliably, a node holding metadata of the stored data on different Datanodes is referred as Namenode [6][12]. Metadata holds the data, which contains information about data, which are stored on different Datanodes. In HDFS after every transaction or read-write operation metadata is going to be updated. Namenode is the pillar of the HDFS architecture; therefore, reliability of Namenode is having significant value 416

2 in HDFS. Whenever Namenode gets down the working of HDFS gets affected. The rest of paper is as organized as follows. Section II provides brief information about HDFS architecture as well as it analyses the behaviour of HDFS under failures. The proposed scenario of the system is explained in the section III. The architecture of proposed scenario is described in Section IV while Section V is the conclusion of our proposed scenario. II. BACKGROUND A. Hadoop Architecture Hadoop comes with a distributed file system called HDFS, which stands for Hadoop Distributed File system [5]. HDFS is a file system designed for storing very large files with streaming data access patterns, running on clusters on commodity hardware. HDFS is a distributed file system. Figure 2: Proposed HDFS Architecture in Multi-node Cluster [3] HDFS is a framework for analysis and transformation of very large data sets using MapReduce paradigm. HDFS stores file system metadata and application data separately. HDFS is built around the idea that the most efficient data processing pattern is a write-once, read-many-times pattern. A dataset is typically generated or copied from source, and then various analyses are performed on that dataset over time. HDFS is optimized for delivering a high throughput of data, and this may be at the expense of latency. Files in HDFS may be written to by a single writer. Writes are always made at the end of the file. There is no support for multiple writers, or for modifications at arbitrary offsets in the file. As in other distributed file systems, like PVFS, Lustre and GFS, HDFS stores metadata on a dedicated server, called the Name Node. Application data are stored on other servers called Data Nodes. All servers are fully connected and communicate with each other using TCP-based protocols. Unlike Lustre and PVFS, the Data Nodes in HDFS do not rely on data protection mechanisms such as RAID to make the data durable. Instead, like GFS, the file content is replicated on multiple Data Nodes for reliability. While ensuring data durability, this strategy has the added advantage that data transfer bandwidth is multiplied, and there are more opportunities for locating computation near the needed data. The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. The architecture of HDFS and report on experience using HDFS to manage 40 petabytes of enterprise data is described at Yahoo! B. Single Point Failure in HDFS HDFS architecture is mainly based on Namenode and Datanode, where Namenode act as a master while Datanode act as a slave. If Datanode gets failed in this scenario, only one machine will down, in that case Namenode will divert the work of failed Datanode to other available Datanode [6][9]. But suppose, Namenode goes down, then there will be a single point failure. To avoid this, HDFS architecture selects a Secondary Namenode which will work after Primary Namenode fails. Fig. 1 shows the architecture of Hadoop with Secondary Namenode. Secondary Namenode is not a Namenode in the sense that Datanodes cannot connect to the secondary Namenode and in no event it can replace Namenode in case of its failure. If Hadoop is not able to use Namenode anymore it will need to copy latest image and logs somewhere else and need to restart whole cluster. It is a time consuming process, and also affects the performance of HDFS. To overcome above problem we are going to propose a system which deals with single point failure which means Namenode failure and recover system as early as possible without restarting cluster. III. PROPOSED WORK In this HDFS architecture, if Namenode gets fails; it means a single point failure. Because as we have stated earlier Namenode is a pillar of HDFS architecture and which contains metadata of the data which are stored on different Datanodes. If Namenode goes down, it will affect the entire cluster. To overcome this single point failure, our suggestion is, replicate the entire Namenode on the other Datanode called as "Recovery Namenode. Recovery Namenode will update all the information of Namenode simultaneously. After failure, Recovery Namenode will act as Namenode. Recovery Namenode will keep track Namenode. After a periodic time interval Recovery Namenode is updated. Initially all the Datanodes sending to heartbeats Namenode, when Namenode gets down Recovery Namenode will broadcast a message to all Datanodes of new Namenode. After that every Datanode will send heartbeats to Recovery Namenode. 417

3 The detailed description of the proposed system is given in the architecture section. It will select new Namenode from available Datanodes. IV. ARCHITECTURE The behaviour of the HDFS architecture will be shown as: Datanode sends a heartbeat along with its time of generation i.e. t. Every Datanode has variant time of heartbeat generation such as t, t,...,t for n nodes. After that, at Namenode this time will stores in a log referred as Heartbeat Log along with arrival time t of each Datanode. Now, calculate the time, to reach Namenode from Datanode for each and every Datanode as shown: tt = t t Where, tt =actual time taken by a heartbeat from Datanode to Namenode. For every Datanode we are considering first x readings. Then, calculate a mean time for every Datanode as follows: mt = 1 x tt Figure 3: Proposed HDFS Architecture There are two cases; first one gives the architecture of the new HDFS before failure while second one gives the architecture of new HDFS after failure with new Namenode. Let us see how this system will work: A. Selection of Recovery Namenode As we said earlier Namenode is a pillar of HDFS architecture, considering this part the next node means Recovery Namenode is having significant responsibilities. If there are n number of nodes in the cluster as shown in the figure below: Figure 4: Hadoop Cluster We have to set some appropriate methods for selection of recovery node so that selection will be fast and efficient. We have to also consider an availability of nodes so that which will not affect performance of the cluster. Here, m is the Namenode while s, s,., s are the Datanodes. From these data node our proposed scenario will select one node as a Recovery Namenode. As we know, after every 3s Datanode sends a heartbeat to Namenode to shoe his availability. In our scenario, each Where, mt = mean time of heartbeat to reach namenode to Datanode of Datanode n. Now, we are having a log of all Datanodes with their respective mean time mt. By applying Quick sort algorithm heartbeat log is updated called as Recovery Namenode List. According to this Recovery Namenode List, the first node from the list is selected as Recovery Namenode. At the time of cluster formation as per the node registration log is maintained. According to availability of Datanodes, Recovery Namenode List is updated, after every 600s. B. Create a Recovery Namenode List For the selection of Recovery Namenode we are creating a Recovery Namenode List from that, according to mean time, list is going to be sorted. In that, algorithm will first check heartbeat response of node. If node is up, add it to recovery node list else ignore that node, while adding calculate travelling mean time of that node so that we will get mean response time. Calculate mean response time for each and every node which is up. Then sort this list according to mean response time. The first node will be new Recovery Namenode for the cluster. C. Communication between Namenode and Recovery Namenode When Namenode will select a Recovery Namenode, the communication between Namenode and Recovery Namenode is an important factor. There will be an instant messaging from Namenode to Recovery Namenode. After a certain time period Namenode will generate an instant message and send it to Recovery Namenode, so that Recovery Namenode will come to know that Namenode is alive. If for 600s Namenode is failed to send an instant message then Recovery Namenode will be declared as new Namenode. Recovery Namenode will broadcast that message to all Datanodes 418

4 create_recovery_namenode_list() get_all_node_list(); while (list is not empty) if(node.heartbeat_response == TRUE) node_mean_time= calculate_heartbeat_mean_time(node); add_node_to_recovery_namenode_list(node, node_mean_time); goto_next_node; else goto_next_node; calculate_heartbeat_mean_time(node) total_travelling_time=0; for(from starting time up to a certain time) hb_start_time[]= get_start_time_time(); hb_recieved_time[]=get_received_time(); hb_traveeling_time = hb_recieved_time[] - hb_start_time[]; total_travelling_time= total_travelling_time+ hb_traveeling_time; node_mean_time= total_travelling_time / (certain_time starting_time); return node_mean_time; quick_sort (recovery_namenode_list with respect to node_mean_time); Figure 5: Algorithm for Selection of Recovery Namenode D. Set a Checkpoint Checkpoint method is widely used in different recovery model [10]. It allows system to recover from unpredictable fault. The idea behind this system is the saving and restoration of the system state. Here, checkpoints are nothing but a time interval which is periodic. To replicate Namenode on Recovery Namenode, a time interval is considered. On a certain time interval checkpoints are created. After every 600s Namenode is replicated on Recovery Namenode. It means checkpoints are created after 600s. Here, checkpoints are sets only for Namenode. Creating periodic checkpoints is the way to protect the metadata of file system. E. Availability of Namenode As per the HDFS architecture, Namenode does share any information about his failure, to overcome this problem in our scenario; Namenode will generate an instant message, sends to Recovery Namenode after 3s to ensure his availability. If Namenode failed to send instant message up to 600s then namenode will be declared as dead node. After that Recovery Namenode will send a message to all Datanode, that he is the new Namenode and send heartbeats to him. Recovery Namenode will start his work from a last checkpoint, before failure of Namenode. F. Failure of Recovery Namenode Whenever Namenode gets fail, Recovery Namenode will occupy his place. When Recovery Namenode will be a new Namenode, again new Recovery Namenode is selected by using same parameters. According to available Datanodes new Recover Namenode list is generated in a similar way, again checkpoints for a new Namenode are created. Though it increases overheads on Namenode as well as Datanode but provides high availability. V. CONCLUSION In cloud computing, unstructured data storage is popular issue. Hadoop deals with unstructured data storage. In this paper, we have studied and analyzed the architecture of the Hadoop Distributed File System under Namenode failure. To overcome single point Namenode failure in HDFS we have proposed an architecture which increases reliability as well as availability of Hadoop. We also focused on selection of Recovery Namenode after failure of Namenode. This proposed architecture is massively helpful for an unrecoverable Namenode failure. REFERENCES [1] Florin Dinu, T. S. Eugene Ng, Understanding the Effects and implications of compute Node Related Failure in Hadoop, HPDC 12, Delft, The Netherlands, June 18-22, [2] Jeffrey Shafer, Scott Rixner, and Alan L. Cox, The Hadoop Distributed Filesystem:Balancing Portability and Performance, Presentation, ISPASS 2010, March 30 th [3] Dhruba Borthakur, The Hadoop Distributed File System: Architecture and Design, The Apache Software Foundation. [4] Ronald Taylor, Pacific Northwest National Laboratory, Richland, WA, An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics, Bioinformatics Open Source Conference 2010, doi: / S12-S1. [5] Konstantin Shvachko, HairongKuang, Sanjay Radia, Robert Chansler The Hadoop Distributed File System, IEEE

5 [6] MohommadAsif Khan, Zulfiqar A. Menon, Sajid Khan, Highly Available Hadoop Namenode Architecture, International Conference on Advanced Computer Science Applications and Technologies2012, Confrence Publishing Services, DOI /ACSAT [7] AsafCidon, Stephen Rumble, Ryan Stutsman, SachinKatti, John Ousterhout and MendalRosenblum, Copysets: Reducing the Frequency of Data Loss in Cloud Storage. [8] FarazFaghri, SobirBazabayev, Mark Overholt, Reza Farivar, Roy H. Campbell and William H. Sanders, Failure Scenario as a Service (FSaaS) for Hadoop Cluster, SDMCMM 12, Montreal, Quebec, Canada, December 3-4,2012. [9] Florin Dinu, T. S. Eugene Ng, Analysis of Hadoop s Performance under Failures. [10] Jorge-Arnulfo Quiane-Ruiz, Christoph Pinkel, Jorg Schad, Jens Dottrich, RAFT at Work: Speeding-Up MapReduce Applications under Task and Node Failures, SIGMOD 11, Athence, Greece, June 12-16, 2011 [11] Big Data Hadoop HDFS and MapReduce [12] Hadoop Getting Started P-Win-1.1/bk_getting-startedguide/content/ch_hdp1_getting_started _ chp3.html [13] Hadoop Tutorial ml [14] Understanding Hadoop Clusters and the network [15] Hadoop in Practice -in-practice/ [16] The Building Blocks of Hadoop Author Information: Prof. Priya Deshpande Assistant Professor -MITCOE Pune priyardeshpande@gmail.com Mr. Darshan Bora ME-Student,MITCOE,Pune darshanbora@hotmail.com 420