RESEARCH ARTICLE Adv. Sci. Lett. 4, 400 407, 2011 Copyright 2011 American Scientific Publishers Advanced Science Letters All rights reserved Vol. 4, 400 407, 2011 Printed in the United States of America Performance Optimization of a Distributed Transcoding System based on Hadoop for Multimedia Streaming Services Myoungjin Kim 1, Yun Cui 1, Seungho Han 1, Hanku Lee 1,2,* 1 Department of Internet and Multimedia Engineering, Konkuk University, Seoul 143-701, Korea 2 Center for Social Media Cloud Computing, Konkuk University, Seoul 143-701, Korea In recent times, significant progress has been achieved in cost-effective and timely processing of large amounts of data through Hadoop based on the emerging MapReduce framework. Based on these developments, we proposed a Hadoop-based Distributed Video Transcoding System which transcodes large video data sets into specific video formats depending on userrequested options. In order to reduce the transcoding time exponentially, we apply a Hadoop Distributed File System and a MapReduce framework to our system. Hadoop and MapReduce are designed to process petabyte-scale text data in a parallel and distributed manner. However, our system processes multi-media data. In this study, we measure the total transcoding time for various values of five MapReduce tuning parameters: block replication factor, Hadoop Distributed File System block size, Java Virtual Machine reuse option, maximum number of map slots and input/output buffer size. Thus, based on the experimental results, we determine the optimal values of the parameters affecting transcoding processing in order to improve the performance of our Hadoop-based system that processes a large amount of video data. From the results, it is clearly observed that our system exhibits a notable difference in transcoding performance depending on the values of the MapReduce tuning parameters. Keywords: Performance Tuning, Distributed Transcoding, MapReduce, Hadoop Optimization, Cloud Computing. 1. INTRODUCTION In recent times, Hadoop based on the MapReduce model has gained considerable attention because the features of the data preprocessing techniques are not timeconsuming and are suitable for processing large-scale data. In particular, MapReduce is emerging as an important programming model for developing distributed dataprocessing applications such as web indexing, data mining, log file analysis, financial analysis, and scientific research for processing petabyte-scale or terabyte-scale text-based data rather than multimedia data (images, videos, audio). * Email Address: tough105@konkuk.ac.kr In order to reduce transcoding time significantly, we proposed a Hadoop-based Distributed Video Transcoding System (HDVTS) based on MapReduce 2 running on a Hadoop Distributed File System (HDFS) 4. The proposed system is able to transcode a variety of video coding formats into the MPEG-4 video format. Improvements in quality and speed are realized by adopting a HDFS for storing large amounts of video data created from numerous users, MapReduce for distributed and parallel processing of video data, and Xuggler libraries for transcoding based on an open source. Performance optimization in distributed dataprocessing applications that use MapReduce has been considered as an important research topic. Many studies have focused on three approaches for tuning MapReduce 1 Adv. Sci. Lett. Vol. 4, No. 2, 2011 1936-6612/2011/4/400/008 doi:10.1166/asl.2011.1261
Adv. Sci. Lett. 4, 400 407, 2011 RESEARCH ARTICLE programming: setting tuning parameters for job and task configuration in MapReduce 3,8, improving the existing scheduling strategies in Hadoop 5, and optimizing MapReduce in heterogeneous clusters 7. MapReduce is widely utilized for large-scale text data analysis in the cloud computing 6,9,10 environment. Several MapReduce tuning parameters must be set by users and administrators who manipulate MapReduce applications. Hence, in order to assist unqualified administrators, Shivnath 8 presented techniques that automate the process of setting the tuning parameters for MapReduce programs. In addition, Guanying Wang et al. 3 also presented MRPerf to facilitate the exploration of the MapReduce design space by analyzing various aspects of a MapReduce setup. Their experiments analyzed a TeraSort job, which is a standard benchmark for evaluating the sorting of terabyte text data. Thus, these performance tuning techniques are applicable only to MapReduce programs that are suitable for terabyte-scale or petabyte-scale text data. However, the MapReduce framework applied to our system processes multi-media data. Hence, optimal tuning parameters for video transcoding processing in Hadoop must be considered. The job configuration and the task tracker configuration parameters play a significant role in the performance of Hadoop applications. In this study, the optimal values of the parameters affecting the transcoding performance are determined by measuring the total transcoding time for various values of five parameters: dfs.replication, dfs.block.size, mapred.job.reuse.jvm.num.tasks, mapred.tasktracker.map.tasks.maximum, and io.file.buffer.size representing the block replication factor, block size, Java Virtual Machine (JVM) reuse option, maximum number of map slots, and buffer size, respectively. The main contribution of our study is to present outof-the-box performance for a MapReduce application that processes huge amounts of video data sets on our cloud cluster, and to provide the optimal values of tuning parameters specified in the media processing system to MapReduce program users who are not familiar with the configuration of Hadoop options. The remainder of this paper is organized as follows: Hadoop and performance tuning are described in Section 2. Section 3 consists of an overview of a Hadoop-based Distributed Video Transcoding System (HDVTS). In Section 4, the Hadoop configuration parameters for performance tuning are presented. Further, the hardware and data sets used in the experiments, and experimental methods are described. In Section 5, the results of several experiments conducted on our cloud cluster are discussed and analyzed. Section 6 contains the conclusion. 2. Hadoop Framework Hadoop, inspired by Google s MapReduce and Google File System 1, is a software framework that supports data-intensive distributed applications handling thousands of nodes and petabytes of data. It can perform scalable and timely analytical processing of large data sets to extract useful information. Hadoop consists of two important frameworks: 1) Hadoop Distributed File System (HDFS), scalable and portable file system written in Java. 2) MapReduce is the first framework for processing large data. HDFS is a distributed file system for supporting applications that process petabyte-scale or gigabyte-scale data sets throughout a cluster or commodity machines. HDFS has master-slave architecture, with a master server called NameNode and slaves called DataNodes. The NameNode controls file operations such as open, close, and rename, and the DataNode is responsible for file read and write operations requested by clients. In order to ensure fault tolerance, HDFS splits the huge amounts of data sets into blocks (default block size: 64 MB), and stores them with replication across each data node. The MapReduce framework provides a specific programming model and a run-time system for processing and creating large data sets amenable to various realworld tasks. This framework also handles automatic scheduling, communication, and synchronization for processing huge datasets and it has fault tolerance capability. The MapReduce programming model is executed in two main steps called mapping and reducing. Mapping and reducing are defined by mapper and reducer functions. Each phase has a list of key and value pairs as input and output. In the mapping step, MapReduce receives input data sets and then feeds each data element to the mapper in the form of key and value pairs. In the reducing step, all the outputs from the mapper are processed, and the final result is generated by the reducer using the merging process. 3. Overview of HDVTS In this section, we briefly describe our proposed system architecture. 1) Our system contains a codec transcoding function and a function with a different display size, codec method, and container format. 2) Our system mainly focuses on the batch processing of large numbers of video files collected for a fixed period rather than the processing of small video files collected in real time. 3) HDFS is applied to our system in order to avoid the high cost of the communication of the video file while data transfer occurs for distributed processing. HDFS is also applied to our system due to the large chunk size (64MB) policy suitable for processing video files and the user-level distributed system. 4) Our system follows load balancing, fault tolerance, and merging and splitting policies provided from MapReduce for distributed processing. HDVTS is mainly divided into four domains: Video Data Collection Domain (VDCD), HDFS-based Splitting and Merging Domain (HbSMD), MapReduce-based Transcoding Domain (MbTD) and Cloud-based Infrastructure Service Domain (CbISD). The core processing for video transcoding is briefly explained as follows: The proposed system uses HDFS as storage for 2
RESEARCH ARTICLE Adv. Sci. Lett. 4, 400 407, 2011 Fig. 1. Diagram of a Hadoop-based Distributed Video Transcoding System distributed parallel processing. The extremely large amount of collected data is automatically distributed in the data nodes of HDFS. For distributed parallel processing, the proposed system exploits the Hadoop MapReduce framework. In addition, Xuggler libraries for video resizing and encoding are utilized in Mapper. The Map function processes each chunk of video data in a distributed and parallel manner. Figure 1 shows the digram of an HDVTS. In this prototype, users and administrators can select video transcoding options such as format, codec, bitrate, width, and height, and audio transcoding options such as codec, bitrate, and sample rate. Further, the summary information of the system including the available storage capacity of HDFS, the activation state of data nodes, and the progress status report for MapReduce job are monitored. 4 Hadoop Configuration Parameters for Performance Tuning The performance of a MapReduce job in Hadoop can be controlled by more than 190 Job, Jobtracker, and Tasktracker configuration parameters. We select five parameters that are expected to significantly affect the performance tuning of a transcoding MapReduce job. Table 1 lists a subset of the selected tuning parameters that are used to provide empirical evidence for the verification of performance difference with respect to transcoding time. The configuration of these parameters controls job behavior. Therefore, the adjustment and combination of parameters must be determined appropriately based on the size and type of data sets. The values of dfs.replication and dfs.block.size can be changed via the hdfs-site.xml file in Hadoop. The values of mapred.job.reuse.jvm.num.tasks and mapred.tasktracker.map.tasks.maximum can be adjusted via the mapred-site.xml file. The value of io.file.buffer.size can be modified in the core.site.xml file. Performance evaluation is conducted on a 28 node HDFS cluster consisting of 1 master node and 27 slave nodes (data node). Each node running on the Linux OS (CentOS 5.5) is equipped with two Intel Xeon 4 core 2.13GHz processors with 4GB registered ECC DDR memory and 1TB SATA-2. All nodes are interconnected with a 100Mbps Ethernet adapter. We also use Java 1.6.0_23, Hadoop- 1.0.2 Xuggler 3.4 for video transcoding. To verify the performance evaluation for encoding very large amounts of video files into target files, we create and use three types of video data sets (5, 10, 20 GB) having different sizes. The total time to transcode the original video data sets (Xvid, AVI, 200 MB, 1280 720) into target files (MPEG4, MP4, 60 MB, 320 240) is measured. 3 Adv. Sci. Lett. Vol. 4, No. 2, 2011 1936-6612/2011/4/400/008 doi:10.1166/asl.2011.1261
Adv. Sci. Lett. 4, 400 407, 2011 RESEARCH ARTICLE Table. 1. A subset of tuning configuration parameters in Hadoop Parameter Name Default Value Values Considered dfs.replication 3 1, 2, 3, 4, 5 dfs.block.size 64MB 32 MB, 64MB, 128MB, 256 MB, 512MB mapred.job.reuse.jvm.num.tasks 1 1, -1 io.file.buffer.size 4K 4 KB, 128 KB, 256 KB, 512 KB, 1024 KB mapred.tasktracker.map.tasks.maximum 2 2, 4, 8, 12, 16 A comparison between the transcoding time with default values for the Hadoop options and the transcoding time with other values for these options is performed. The following default Hadoop option values are used during the experiments: (1) JVM runs in server mode with 1024 MB heap memory for map tasks, (2) JVM reuse option is enabled, (3) HDFS block size is 64 MB, (4) Block replication factor is three, and (5) I/O file buffer size is 4 KB. The optimal values of the tuning parameters are determined by analyzing the transcoding performance for five sets of experiments: (1) block replication factor is varied (1, 2, 3, 4, 5), (2) block size is varied (32 MB, 64 MB, 128 MB, 256 MB, 512 MB), (3) buffer size is changed, (4) maximum number of task trackers is varied, and (5) JVM reuse option is enabled (value: -1) and disabled (value: 1). lower storage space requirement for the replicas generated based on dfs.replication. Further, in our system, the performance is better for dfs.block.size value of 256 MB or 512 MB than for other values. Thus, this parameter should be set to a value larger than or approximately equal to the original file size, which is 200 MB in our experiments. 5 Evaluation and Results In this section, we demonstrate the differences in job running times for transcoding processing depending on appropriate and inappropriate parameter settings for five sets of experiments. With the default values for Hadoop options, our system provides excellent transcoding time performance for very large amounts of video data sets. For example, According to Figure 2, the transcoding process requires approximately 236 sec (about 4 min), 385 sec (about 6 min), and 696 sec (about 12 min) for 5 GB, 10 GB, and 20 GB data sets, respectively. Figures 2 and 3 show the encoding time required to complete the transcoding process for three different data sets. Two tuning parameters, dfs.replication representing the HDFS block replication factor, and dfs.block.size representing the HDFS block size, are varied in Figures 2 and 3, respectively, and the value of other tuning parameters is the same as the default values. From these figures, a significant impact on transcoding performance is observed. First, when dfs.replication is set to 2 or 3, our system shows an improvement in the transcoding process. Between these two values, the value 2 is preferable because it has a Fig.2. Total transcoding time in Hadoop for 5GB, 10GB, and 20GB data sets, for various values of dfs.replication Fig.3. Total transcoding time in Hadoop for 5GB, 10GB, and 20GB data sets, for various values of dfs.block.size 4
RESEARCH ARTICLE Adv. Sci. Lett. 4, 400 407, 2011 The main role of io.file.buffer.size is to change the buffer size that is used to read and write sequence files. In this set of experiments, when the buffer size is set to 128 KB or 256 KB, the performance shows an average improvement of 3% compared with the performance obtained using the default value. However, when the buffer size is set to 512 KB or 1024 KB, the performance improvement varies according to the size of the data sets. The results are represented in Figure 4. difference is negligible. Hadoop runs map tasks in its own JVM when the same job is performed. At this time, the overhead time required by each map task to prepare to use the JVM is approximately 1 sec. Thus, when the JVM reuse option is enabled for many map tasks having a short life cycle, the performance improves. However, the number of map tasks in a MapReduce job to process 5 GB video data sets is only 20. Hence, this set of experiments with 5GB video data sets does not demonstrate a large difference in the transcoding performance. Fig.4. Total transcoding time in Hadoop for 5GB, 10GB, and 20GB data sets, for various values of io.file.buffer.size In the fourth set of experiments, we focus on exploring and analyzing different values for mapred.tasktracker.map.tasks.maximum. This option represents the maximum number of map tasks performed simultaneously on a single data node. Before performing this set of experiments, we expected that the transcoding job performance for the maximum number of map slots would depend on the number of CPUs in a physical machine. i.e., if the value is set to 4, four map tasks to process the MapReduce job are performed at a single data node simultaneously. It was expected that a value with 8 would exhibit better performance than other values. In fact, from the experimental results shown in Figure 5, the best transcoding performance is achieved when the value of this option is set to 8 because our system has eight CPUs on each node. We exploit the inherent features of Hadoop to alleviate the bookkeeping overhead. In particular, we run multiple map tasks in one JVM by using the mapred.job.reuse.jvm.num.tasks parameter. If the JVM reuse option is enabled by setting the value as -1, there is no limit on the number of times that the same JVM can be reused for map tasks. JVM reuse is disabled by setting the value as 1, and then, a map task can use a JVM only once. From Figure 6, although better performance is observed with the value -1 than with the value 1, this Fig.5. Total transcoding time in Hadoop for 5GB, 10GB, and 20GB data sets, for various values of mapred.tasktracker.map.tasks.maximum Fig.6. Total transcoding time in Hadoop for 5GB, 10GB, and 20GB data sets, for various values of mapred.job.reuse.jvm.num.tasks 6 Conclusions This study aims to determine the optimal values of the tuning parameters in a Hadoop-based distributed video transcoding system by measuring the total transcoding time for various values of five parameters: block size, block replication factor, JVM reuse factor, 5
Adv. Sci. Lett. 4, 400 407, 2011 RESEARCH ARTICLE maximum number of map slots, and buffer size. From experiments, it is observed that our system exhibits good performance for the media transcoding processes when the block size has a value that is greater than or nearly equal to the original file size, and the block replication factor and JVM reuse factor are configured as 3 and -1, respectively. Furthermore, when the buffer size is set to 128 KB or 256 KB, and the maximum number of map slots is set to a value approximately equal to the number of CPUs in a single data node, our system delivers good performance for media transcoding processes. ACKNOWLEDGMENTS This research was supported by the MSIP(Ministry of Science, ICT & Future Planning), Korea, under the ITRC(Information Technology Research Center) support program (NIPA-2014-H0301-14-1001) supervised by the NIPA(National IT Industry Promotion Agency). REFERENCES [1] Ghemawat, S., Gobioff, H., Leung, S.-T. The Google file system, Operating Systems Review (ACM), 37(2003), 29-43. [2] Dean, J., Ghemawat, S. MapReduce: Simplified data processing on large clusters, Communication of the ACM, 51(2008), 107-113. [3] Wang, G., Butt, A.R., Pandey, P., Gupta, K. Using realistic simulation for performance analysis of MapReduce setups, Proceedings of 1st ACM Workshop on Large-scale System and Application Performance, Art no. 1552278, 19-26. [4] Shafer, J., Rixner, S., Cox, A.L. The Hadoop Distributed Filesystem: Balancing Portability and Performance, Proceedings of the 2010 IEEE International Symposium on Performance Analysis of Systems and Software, Art no.5452045 (2010), 122-133. [5] Polo, J., Carrera, D., Becerra, Y., Torres, J., Ayguade, E., Steinder, M., Whalley, I. Performance-Driven Task Co-Scheduling for MapReduce Environments, Proceedings of 12th IEEE/IFIP Network Operations and Management, Art no. 5488494 (2010), 373-380. [6] Ambrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M. A view of cloud computing, Communication of the ACM, 53(2010), 50-58. [7] Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Qin, X. Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters, Proceedings of 2010 IEEE International Symposium on Parallel and Distributed Processing, Art no. 5470880 (2010) [8] Babu, S. Towards Automatic Optimization of MapReduce Programs, Proceeding of 1st ACM Symposium on Cloud Computing, (2010), 137-142. [9] Zhang, Q., Cheng, L., Boutaba, R. Cloud Computing: State-ofthe-art and research challenges, Journal of Internet Services and Applications, 1(1)(2010), 7-18 [10] Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I. Cloud Computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5 th utility, Future Generation Computer Systems, 25(6)(2009), 599-616. 6