A Hadoop-based Multimedia Transcoding System for Processing Social Media in the PaaS Platform of SMCCSE

Size: px
Start display at page:

Download "A Hadoop-based Multimedia Transcoding System for Processing Social Media in the PaaS Platform of SMCCSE"

Transcription

1 KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 6, NO. 11, Nov Copyright 2012 KSII A Hadoop-based Multimedia Transcoding System for Processing Social Media in the PaaS Platform of SMCCSE Myoungjin Kim 1, Seungho Han 1, Yun Cui 1, Hanku Lee 1,* and Changsung Jeong 2 1 Department of Internet and Multimedia Engineering, Konkuk University Gwangjin-gu, Seoul Republic of Korea 2 Department of Electrical Engineering, Korea University Seongbuk-gu, Seoul Republic of Korea [ tough105, shhan87, ilycy, hlee@konkuk.ac.kr, csjeong@korea.ac.kr] *Corresponding author: Hanku Lee Received April 9, 2012; revised July 9, 2012; accepted August 16, 2012; published November 30, 2012 Abstract Previously, we described a social media cloud computing service environment (SMCCSE). This SMCCSE supports the development of social networking services (SNSs) that include audio, image, and video formats. A social media cloud computing PaaS platform, a core component in a SMCCSE, processes large amounts of social media in a parallel and distributed manner for supporting a reliable SNS. Here, we propose a Hadoop-based multimedia system for image and video transcoding processing, necessary functions of our PaaS platform. Our system consists of two modules, including an image transcoding module and a video transcoding module. We also design and implement the system by using a MapReduce framework running on a Hadoop Distributed File System (HDFS) and the media processing libraries Xuggler and JAI. In this way, our system exponentially reduces the encoding time for transcoding large amounts of image and video files into specific formats depending on user-requested options (such as resolution, bit rate, and frame rate). In order to evaluate system performance, we measure the total image and video transcoding time for image and video data sets, respectively, under various experimental conditions. In addition, we compare the video transcoding performance of our cloud-based approach with that of the traditional frame-level parallel processing-based approach. Based on experiments performed on a 28-node cluster, the proposed Hadoop-based multimedia transcoding system delivers excellent speed and quality. Keywords: Hadoop, mapreduce, multimedia transcoding, cloud computing, paas A preliminary version of this paper appeared in the ICONI (International Conference on Internet) 2011, December 15-19, This research was supported by the MKE (Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) [NIPA-2012-H ].

2 2828 Kim et al.: A Hadoop-based Multimedia Transcoding System for Processing Social Media 1. Introduction In these days of rapid technological change, cloud computing [20][22][24][31][32] has achieved remarkable interest from researchers and the IT industry for providing a flexible dynamic IT infrastructure, QoS guaranteed computing environments, and configurable software services [11]. Due to these advantages, many service providers who release Social Network Services (SNS) [8][12][14] utilize cloud computing in order to reduce maintenance costs of building and expanding computing resources for processing large amounts of social media data, such as video, image, and text formats. In order to develop a SNS based on large amounts of social media, scalable mass storage for social media data created daily by users is needed. For example, the amount of data generated by Twitter every day reaches up to 7TB. Facebook also produces around 10TB because media files have recently changed from low capacity, low definition to high capacity and high definition formats. In addition to transferring social media data to end-users, media transcoding approaches [3][7][9] are required for delivering a variety of video data in multiple formats to heterogeneous mobile devices. Moreover, distributed and parallel data processing models such as Hadoop [2][13], Google s MapReduce model [1][21] and the Message Passing Interface (MPI) standard [4] are also needed for data processing in a parallel and distributed computing environment. In an earlier publication [5], we described the new concept of a Social Media Cloud Computing Service Environment (SMCCSE) that enables cloud computing technologies, approaches in the intensive use of computing resources, and cloud services for developing social media-based SNS. In particular, we focused on designing a social media PaaS platform as the core platform of our SMCCSE in [17]. The main role of the social media PaaS platform is to provide a distributed and parallel processing system for media transcoding functions and delivering social media, including video and audio files, to heterogeneous devices such as smart phones, personal computers, televisions, and smart pads. This platform is composed of three parts: A social media data analysis platform for large scalable data analysis; a cloud distributed and parallel data processing platform for storing, distributing, and processing social media data; and finally, a cloud infra management platform for managing and monitoring computing resources. In this paper, we focus on designing and implementing a Hadoop-based multimedia transcoding system for delivering image and video files in a SNS by adopting a social media PaaS platform in SMCCSE. Our transcoding system consists of two modules, including a video transcoding module for converting video data and an image transcoding module for converting image data. Our video transcoding module can transcode a variety of video coding formats into the MPEG-4 video format. The image transcoding module can transcode large amounts of image data sets into a specific format. In the traditional multimedia transcoding approaches, many researchers have focused on distributed and cluster-based video media approaches, such as those found in [6][15][28] that reduce processing time and maintenance costs for building a computing resource infrastructure. However, these approaches focus on procuring computing resources for a video transcoding process by simply increasing the number of cluster machines in a parallel and distributed computing environment. In addition, they do not consider load balancing, fault tolerance and a data replication method to ensure data protection and expedite recovery.

3 KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 6, NO. 11, Nov Furthermore, there has been limited progress in research related to splitting and merging policies that are considered significant in distributed transcoding. In order to overcome these limitations, we apply a cloud computing environment to our Hadoop-based multimedia transcoding system. Improvements in quality and speed are achieved by adopting Hadoop Distributed File System (HDFS) [18][29] for storing large amounts of video data created by numerous users, MapReduce [10] for distributed and parallel processing of video data, and Xuggler [25] and Java Advanced Imaging (JAI) [27] for media transcoding based on open source. In addition, our system improves the distributed processing capabilities and simplifies system design and implementation by incorporating data replication, fault tolerance, load balancing, file splitting and merging policies provided by Hadoop. Our paper is organized as follows: in Section 2 we describe the basic idea of cloud computing, HDFS, MapReduce, and media transcoding approaches. In Section 3, we introduce the PaaS platform in SMCCSE, describing three sub-platforms in detail. In Section 4, we propose a Hadoop-based media transcoding system in the PaaS platform. Design and implementation strategies of our system are provided in Section 5. In Section 6, we discuss the results of several experiments conducted on our cloud cluster by presenting optimal Hadoop options suitable for media transcoding; in addition, we compare the transcoding performance of our Hadoop-based transcoding approach implemented in Java with that of the traditional frame-based parallel transcoding approach implemented in C and C++. Section 7 comprises the conclusion and potential future research. 2.1 Hadoop and MapReduce 2. Related Work Hadoop, inspired by Google s MapReduce and Google File System [33], is a software framework that supports data-intensive distributed applications handling thousands of nodes and petabytes of data. It can perform scalable and timely analytical processing of large data sets to extract useful information. Hadoop consists of two important frameworks: 1) Hadoop Distributed File System (HDFS), like GFS, is a distributed, scalable and portable file system written in Java. 2) MapReduce is the first framework developed by Google for processing large data sets. The MapReduce framework provides a specific programming model and a run-time system for processing and creating large data sets amenable to various real-world tasks [30]. This framework also handles automatic scheduling, communication, and synchronization for processing huge datasets and it has fault tolerance capability. The MapReduce programming model is executed in two main steps called mapping and reducing. Mapping and reducing are defined by mapper and reducer functions. Each phase has a list of key and value pairs as input and output. In the mapping step, MapReduce receives input data sets and then feeds each data element to the mapper in the form of key and value pairs. In the reducing step, all the outputs from the mapper are processed, and the final result is generated by the reducer using the merging process. 2.2 Media transcoding The term media transcoding is defined in many publications, such as [23][29]. In [11], to bring multimedia contents and service to the numerous heterogeneous client devices while retaining the ability to go mobile, multimedia information must be adapted, that is referred to as media transcoding technology.

4 2830 Kim et al.: A Hadoop-based Multimedia Transcoding System for Processing Social Media Fig. 1 illustrates the architecture of a legacy transcoding system. First, the client requests a transcoding function from a transcoding server. The transcoding sever reads the original media data from the media server, proceeds to transcode the data depending on user requested resolution, bit-rate, and frame rate. The transcoding server then sends the transcoded media data to the client [30]. However, this media transcoding processing imposes a heavy burden on the existing internet infrastructure and computing resources because more recent media files, such as video and image files have changed to high capacity/high definition. Fig. 1. The architecture of a legacy transcoding system [30] Therefore, many researchers apply distributed and parallel computing to media transcoding methods. Jiani Guo et al. in [15] have proposed a cluster-based multimedia web server. This team has designed and implemented a media cluster that dynamically generates video units in order to satisfy the bit rate requested by many clients and has proposed seven load balance scheduling schemes for the MPEG transcoding service. Y. Sambe et al. in [16] have designed and implemented a distributed video transcoding system able to transcode a MPEG-2 video file into diverse video formats with different rates. The main idea behind transcoding a video file is that the transcoder chunks the MPEG-2 video file into small segments along the time axis and transcodes them in a parallel and distributed manner. Zhiqiang et al. in [19] described a cluster-based transcoder that can transcode MPEG-2 format video files into MPEG-4 and H.264 format video files with faster transcoding speed. This system is composed of a master node and a number of worker nodes. The master node consists of six threads, a splitter, merger, sender, receiver, scheduler and an audio transcoder. 3. Brief Overview on PaaS Platform of SMCCSE In this section, we briefly review the Social Media Cloud Computing Service Environment (SMCCSE), focusing on describing the social media PaaS platform. 3.1 SMCCSE (Social Media Cloud Computing Service Environment) Our SMCCSE has a multiple service model using cloud computing to support SNSs such as Twitter and Facebook, social media services such as YouTube, and social game services like the social network games in Facebook.

5 KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 6, NO. 11, Nov First, our service model offers social media APIs, a web-based social SDK, and service delivery platforms for developing the SNS as the form of SaaS. Second, in order to provide social media data with reliable services to users, this service model also provides a distributed and parallel data processing platform for storing, distributing, and en/decoding large amounts of social data (including audio, video, and image formats) as a form of PaaS. Finally, the service model provides an IaaS based on virtualization in order to reduce the cost associated with building computing resources, such as servers, storage, and so on. Fig. 2 summarizes this Social Media Cloud Computing service model. Here we introduce the SMCCSE architecture. Designing a SMCCSE involves establishing an environment for supporting the development of SNS, addressing numerous SNSs, providing approaches for processing large amounts of social media data, and providing a set of mechanisms to manage the entire infrastructure. 3.2 PaaS Platform in SMCCSE Fig. 2. Social Media Cloud Computing Service model The PaaS platform is likely to be the core platform of SMCCSE and IaaS, and provides the physical computing environment. Fig. 3 shows the whole architecture of the PaaS platform in SMCCSE Social Media Data Analysis Platform The role of a social media data analysis platform is to analyze social media data including text, images, audio, and videos and to provide various libraries that perform the functions of encoding, decoding, transcoding, and transmoding these different formats. In social media based SNSs, the analysis social media data is one of the most important elements for offering reliable services to users. In order to recommend and offer social media of specific types to users, our platform analyzes usage patterns, types, and correlation of social media shared, created, and published by users in advance. The other key function of a social media data

6 2832 Kim et al.: A Hadoop-based Multimedia Transcoding System for Processing Social Media analysis platform is to provide a user friendly interface that conducts the functions of transcoding and transmoding so that users can easily create, share, and upload social media, especially image and video content via social media common algorithms libraries. Fig. 3. Social Media Cloud Computing PaaS Platform Cloud Distributed and Parallel Data Processing Platform The main function of a cloud distributed and parallel data processing platform as a core platform in SMCCSE is to store, distribute, and process large amounts of social media data. The social media data are then transferred to user devices, such as mobile phones, smart pads, PCs, TVs, etc. Distributed and parallel data processing system are composed of two systems: a distributed data system and a distributed parallel processing system. The distributed data system adopts HDFS for a distributed file system and HBase (Hadoop Database) for a distributed DB system. In addition, we also select MapReduce as a distributed parallel programming model. In practice, this platform carries out functions provided by social media common algorithms libraries in social media data analysis platforms. First, newly created social media data (text, images, audio, and video) are stored on HDFS. Stored data is processed in two steps using MapReduce. In the first step, our platform conducts analysis work for the execution of each core logic defined by social media APIs in the SaaS platform. For instance, if the SaaS platform defines a social media API that shows a list of video clips a particular group of users have seen most, MapReduce analyzes the social media data and returns any result to the social media API in order to provide the list to requestors. In the second step, encoding, decoding, transcoding, and transmoding functions are carried out to serve the QoS service to hundreds of heterogeneous smart devices. Traditional approaches to media conversion are very time-intensive. However, our platform has reduced the media conversion time by using enabling cloud computing technologies and

7 KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 6, NO. 11, Nov large scalable computing resources in a cloud computing environment. Using the fixed size policy [26][34], a traditional file splitting technique, the content of a single image is split into small chunks and stored in HDFS. Each chunk is then encoded in parallel and subsequently combined into a single file again using the MapReduce programming model. MapReduce reduces run time for encoding work. The transcoding and transmoding functions are carried out using the same approach Cloud Infra Management Platform The Cloud Infra Management Platform consists of cloud QoS, Green IDC, and Cloud Infra Management. Cloud Infra management manages and monitors computing resources that do not depend on a specific OS or platform. Cloud Infra management includes resource scheduling, resource information management, resource monitoring, and virtual machine management functions. These functions are provided on a web service based on Eucalyptus. 4. Hadoop-based Multimedia Transcoding System in PaaS of SMCCSE Media transcoding is a very important function in the PaaS platform of SMCCSE for delivering social media content. Hence, we design and implement a Hadoop-based multimedia transcoding system, including image and video transcoding modules by utilizing the PaaS platform scheme of SMCCSE. 4.1 Overall System Architecture The core processing for video and image transcoding is briefly explained as follows: The proposed system uses HDFS as storage for distributed parallel processing. The extremely large amount of collected data is automatically distributed in the data nodes of HDFS. For distributed parallel processing, the proposed system exploits the Hadoop MapReduce framework. In addition, Xuggler libraries for video resizing and encoding as well as JAI libraries for image are utilized in Mapper. The Map function processes each chunk of video data in a distributed and parallel manner. Fig. 4 illustrates the overall architecture of a Hadoop-based multimedia transcoding system in PaaS platform. 4.2 Hadoop-based Multimedia Transcoding System Architectural Components Our system is mainly divided into four domains: Video and Image Data Collection Domain (VIDCD), HDFS-based Splitting and Merging Domain (HbSMD), MapReduce-based Transcoding Domain (MbTD) and Cloud-based Infrastructure Domain (CbID) VIDCD The main contribution of VIDCD is the collection of different types of original encoded video and image files created by media creators such as SNS providers, media sharing services, and personal users, and the storage of these files on our local file system. It also collects transcoded video and image data sets converted to target format files through a transcoding processing step based on MapReduce in MbTD, and stores them on the local file system. The period for collecting original encoded video and image data sets can be set by administrators and users according to a data set size and acquisition time HbSMD The main role of HbSMD, which runs on HDFS, is to split collected original video and image data sets into blocks of a configured size, and to automatically distribute all blocks over the cluster. In HbSMD, the default block size is set to 64 MB, but it is changed by administrators and users to various other values, such as 16 MB, 32 MB, 128 MB, 256 MB, etc. When a block

8 2834 Kim et al.: A Hadoop-based Multimedia Transcoding System for Processing Social Media is distributed, it is replicated at three data nodes according to the Hadoop distribution policy, thus complying with the entire distributed processing procedure and enabling recovery from a system failure caused by data loss. The other role of HbSMD is to merge blocks transcoded by transcoders in MbTD into target video and image files, and to transmit them to VIDCD. The number of block replicas is set to 1, 2, 4, 5, etc MbTD MbTD performs several tasks that transcode distributed blocks in each data node by using a MapReduce-based transcoding module with Xuggler and JAI. A data node 1 and a transcoder 1 are located in the same physical machine. First, the transcoders implement the decoding step. Next, the resizing step is implemented if the users and administrators require a change in the resolution of a video file and an image file. If such a change is not required, the transcoders skip this step. Finally, the transcoders encode the decoded blocks into a target file based on the requirements of the user CbID CbID offers infrastructure services in a cloud computing environment via server, storage, CPU, and network virtualization techniques. Because of the massive storage space and enormous computing resource requirements of such systems, small service vendors are unable to afford the cost of building them. When users require logical computing resources to build and implement this system, CbID automatically deploys a virtualized cluster environment. CbID allows users to select a specific configuration of memory, CPU, storage, and the number of clusters. In addition, it provides the easy installation and configuration environment of HDFS and MapReduce without much effort from the user. In this paper, we present the idea and concept of CbID; its implementation is not considered. Fig. 4. Overall Architecture of a Hadoop-based Multimedia Transcoding System in the PaaS platform

9 KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 6, NO. 11, Nov Design and Implementation In this section, we discuss several design features of our Hadoop-based multimedia transcoding system. We describe the design and implementation of MbTD, which is responsible for image and video data processing through the MapReduce framework. The image and video transcoding modules are also designed based on the MapReduce framework. The MapReduce framework provides the FileInputFormat and FileOutputFormat classes for processing petabyte-scale text data in a parallel and distributed manner. However, these classes are not suitable for processing media data. Therefore, we designed new classes: ImageInputFormat, VideoFileInputFormat, ImageOutputFormat, and VideoFileOutputFormat suitable for performing image and video transcoding functions within the MapReduce framework. We also discuss implementation issues for MbTD based on Xuggler and JAI, as described in the previous section. 5.1 Design Strategy We first discuss the design strategy of our image transcoding module. FileInputFormat is responsible for transferring stored data in the HDFS to mapper. The module is designed to read text data line by line. However, since our system deals with media data like images and video data, rather than text data, we design a new ImageInputFormat class that can read image data by transforming such data to byte stream form. In addition, the ImageRecordReader class is designed to read one line record transformed as a byte stream form from ImageInputFormat and pass it to mapper. When one line record transfers to mapper, it is composed of a video file name and a byte stream of the image file as key and value pairs. For FileOutputFormat, the record is in the same state as for FileInputFormat explained above. Hence, we also design new ImageOutputFormat and ImageRecordWriter classes that receive key and value pairs in record form created as a result of mapper and reducer. Subsequently, these classes output the record in a specified directory. Fig. 5 illustrates four class diagrams for image conversion function. Although, most MapReduce applications bring a result via mapper and/or reducer, we only implement mapper in our system since it is unnecessary to reduce a set of intermediate values using reducer. Moreover, the class ImageResizer using the JAI libraries is designed to perform a transcoding function by processing key and value pairs in the mapper phase. Fig. 6 shows ImageConversion with main ( ), Map, and ImageResizer class diagrams. ImageInputFormat class diagram ImageOutputFormat class diagram

10 2836 Kim et al.: A Hadoop-based Multimedia Transcoding System for Processing Social Media ImageRecordReader class diagram ImageRecordWriter class diagram Fig. 5. New class diagrams designed for the image conversion function Fig. 6. ImageConversion, Map, and ImageResizer class diagrams The design of the video conversion module is very similar to that for image conversion. We design VideoFileInputFormat and VideoRecordReader receiving a video file from HDFS as key and value pairs compatible with the MapReduce framework for processing. We also design VideoFileOutputFormat and VideoRecordWriter classes that write output data in HDFS. The Resizer, MyVideoListener, and VideoConverter classes are responsible for processing video transcoding in the video conversion module. These three classes are designed using Xuggler libraries. Method convertvideo in VideoConverter transcodes the input video data as a byte stream form according to the file size and format of the video as set by the users and administrators. Fig. 7 illustrates the four class diagrams for the image conversion function. Fig. 8 shows Resizer,

11 KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 6, NO. 11, Nov MyVideoListener, VideoConverter, VideoConversion with main ( ), and Map class diagrams. VideoFileInputFormat class diagram VideoFileOutputFormat class diagram VideoRecordReader class diagram VideoRecordWriter class diagram Fig. 7. Newly designed class diagrams for video conversion funciton Resizer class diagram MyVideoListener class diagram

12 2838 Kim et al.: A Hadoop-based Multimedia Transcoding System for Processing Social Media VideoConverter class diagram 5.2 Implementation VideoConversion and ap class diagram Fig. 8. Class diagrams for transcoding processing In this section, we focus on the detailed description of the implementation of MbTD, the component that plays an important role in our system. MbTD is responsible for the processing of video data through the MapReduce framework. Fig. 9 shows the detailed MapReduce-based programming strategy of MbTD. First, we implement InputFormat that transfers original video data sets stored on HDFS. The InputFormat plays two significant roles. The first role is to provide information about the number of map tasks for the MapReduce framework in advance. Therefore, the map tasks are prescheduled in the MapReduce framework. The second role is to read a record that is transferred to map () of a map class from the original video datasets. This function is performed by RecordReader. The RecordReader provided from FileInputFormat is designed to read one line from a source file and pass it to map (). Next, we implement Mapper to process each record received from the RecordReader. Mapper receives a video file name and byte stream of the video file as key and value pairs from RecordReader. The key and value pairs are processed by map () in a parallel and distributed manner. This transcoding processing is carried out by the Xuggler and JAI media libraries. The result of complete processing is transmitted in the form of converted key and value pairs to OutputFormat. Finally, we also implement OutputFormat that writes the complete output data processed by Mapper to HDFS. That is, the OutputFormat rewrites key and value pairs as a file on HDFS. In this paper, we implemented MbTD, the main component of our system, by using the implementation strategy explained above. Fig. 9. The MapReduce-based programming strategy in MbTD

13 KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 6, NO. 11, Nov Performance Evaluation In this section, we present the experimental results, describing optimal Hadoop options for processing image and video files. Moreover, we explain the hardware specifications of our cloud cluster, image and video data sets used in performance evaluation and the experimental methods for measuring media transcoding time. 6.1 Experiment Environment Our performance evaluation is performed on a single enterprise scale cluster that is composed of 1 master node and 27 computational nodes (data nodes called in HDFS). The only way to access the cluster is through the master node. Each node running on the Linux OS (Ubuntu LTS) is equipped with two Intel Xeon 4 core 2.13 GHz processors with 4 GB registered ECC DDR memory and 1 TB SATA-2. All nodes are interconnected by a 100 Mbps Ethernet adapter. We also use Java 1.6.0_23, Hadoop , Xuggler 3.4 for video transcoding and JAI for image transcoding. Because the structure of the cluster is homogeneous, it provides a uniform evaluation environment. In order to verify the performance for our transcoding functions including image and video transcoding processing, we use six types of video data sets (Table 1), including several 200MB video files and six types of image data sets, (Table 1) including several approximately 20MB image files. Table 1. Image and video data sets for performance evaluation Video data sets Size of file 1GB 2GB 4GB 8GB 10GB 50GB Number of video files Image data sets Size of file 1GB 2GB 4GB 8GB 10GB 50GB Number of video files Table 2. Parameters for each original and transcoded video file Parameter Original video file Transcoded video file Codec Xvid MPEG-4 Container AVI MP4 Size 200MB 60MB Duration 3 min 19s 3 min 19s Resolution 1280 x x 240 We measure the total transcoding time of image and video transcoding modules. For evaluating the image transcoding function, we focus on measuring the total time to transcode large amounts of image data sets (JPG files) into a specific format (PNG files). For evaluating the video transcoding function, we measure the encoding time to transcode large sizes of video files into target files. The parameters for each original and target transcoded video file are listed in Table 2. During the experiment, the following default options in Hadoop are used. (1) The number of block replications is set to 3. (2) The block size is set to 64MB. In order to verify the efficiency of our system, we conduct three sets of experiments: (1) examine how a change in cluster size affects performance speedup, (2) explore different Hadoop options for different block sizes

14 2840 Kim et al.: A Hadoop-based Multimedia Transcoding System for Processing Social Media (32, 64, 128, 256, 512), and (3) explore different Hadoop options for different block replication factors (1, 2, 3, 4, 5). 6.2 Changing Cluster Size for Speedup Performance In the first set of experiments, we measure the total transcoding time for image and video transcoding functions under varying cluster size, including 1, 4, 8, 12, 16, 20, 24, 28 nodes. Hadoop default options are explained in the experiment environment section. Table 3. Total image and video transcoding time for various cluster size and speedup (s) The total image transcoding time (s) Cluster size (node) Speedup The total video transcoding time (s) Cluster size (node) Speedup Image data set size 1GB 2GB 4GB 8GB 10GB 50GB Video data set size 1GB 2GB 4GB 8GB 10GB 50GB We also conduct parallel speedup measurements. Parallel speedup refers to how many times faster the parallel and distributed executions are compared to running the transcoding

15 KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 6, NO. 11, Nov functions implemented by the same MapReduce programming on a single node. If speed up is greater than 1, there is at least some gain from carrying out the work in parallel. If speedup is the same as the number of machines, our cloud server and MapReduce programming has a perfect scalability and ideal performance. Speedup is defined as: Speedup (n) = transcoding time on 1 node / transcoding time on n nodes. Table 3 shows the transcoding time as a function of cluster size and speedup. Fig. 10(a) shows the result of the experiment for each cluster size in an image transcoding module. Fig. 10(c) illustrates the result of speedup in the same module. Fig. 10(b) and Fig. 10(b) show the effect of cluster size and speed up in a video transcoding module, respectively. According to Table 3, our Hadoop-based media transcoding system shows excellent performance in image and video transcoding functions for very large image and video files. For example, with image transcoding, our system takes 428 s (approximately 7 min) to conduct image transcoding processing for 50GB in 28 nodes. For video transcoding, it takes 1623 s (approximately 27 min) under the same conditions. (a) (b) (c) Fig. 10. (a) Transcoding time versus cluster size using the image transcoding module and (b) the video transcoding module (c) Speedup versus cluster size in the image transcoding module and (d) video transcoding module From Fig. 10 (a) and (b), for 4, 8 and 12 nodes the running time decreases dramatically and from 16 nodes to 28 nodes the transcoding time reduces gradually in both of the two modules. From Fig. 10 (c) and (d), speedup performance for 10 and 50GB data sets are higher compared to speedup performances for 1, 2, 4, 8GB datasets, implying that our system exhibits good performance when the size of the data set increases. (d)

16 2842 Kim et al.: A Hadoop-based Multimedia Transcoding System for Processing Social Media 6.3 Changing Block Size Factor For the second set of experiment, we measure the total elapsed time with the different Hadoop options with respect to block size factor (default: 64MB). Hadoop processes large amounts of data sets in a parallel and distributed manner after data sets are chunked into a 64MB chunk size. However, users and programmers can change the block size options in order to improve the performance of data processing according to the size and type of unstructured data. Thus, in order to determine the optimal block size condition, we measure the total media transcoding time with five block size options, 32, 64, 128, 256, 512MB. Table 4 lists the measured image and video transcoding time in seconds for different block sizes. Table 4. Total image and video transcoding time for various values of block size (s) The total image transcoding time (s) Block size (MB) The total video transcoding time (s) Block size (MB) Image data set size 1GB 2GB 4GB 8GB 10GB 50GB Video data set size 1GB 2GB 4GB 8GB 10GB 50GB As shown in Table 4, Fig. 11 (a), and Fig. 11 (b), there is no difference in performance for different block size options in the image transcoding module (Fig. 11 (a)), whereas the transcoding performance for video transcoding module with Hadoop block size options of 256 or 512MB are better compared to 32, 64, and 128MB. From the results, we find that when the block size option is set to a value greater than or close to the original file size, our system provides good performance for media transcoding processes. In fact, since one video data set has a file size of 200MB, 256 or 512 MB block sizes show the best performance for transcoding processing. (a) Fig. 11. (a) Total image transcoding time versus data size for various block sizes (b) Total video transcoding time versus data size for various block sizes (b)

17 KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 6, NO. 11, Nov Changing Block Replication Factor In the third set of experiments the total transcoding time with different Hadoop options with respect to block replication factor (default: 3) is measured. When large amounts of data sets are stored in HDFS, HDFS splits the data set into fixed size blocks for quick searching and processing. In fact, with the Hadoop default option for block replication, replicated data is stored on three data nodes of HDFS in order to rebalance, move copies around, and continue data replication when system faults such as disk failures or network connection problems occur. Hence, in order to verify how block replication factors affects performance, we measure the elapsed time in order to complete media transcoding processing. Five values of block replication factor, 1, 2, 3, 4 and 5 are used in the experiment. Table 5 lists the measured image and video transcoding time in seconds for different block replication factor values. Table 5. Total image and video transcoding time for various values of block replication factor (s) The total image transcoding time (s) Block replication (EA) The total video transcoding time (s) Block replication (EA) Image data set size 1GB 2GB 4GB 8GB 10GB 50GB Video data set size 1GB 2GB 4GB 8GB 10GB 50GB According to Table 5, Fig. 12 (a), and Fig. 12 (b), two modules show the best performance when the block replication factor value is set to three. The worst performance occurs when the block replication factor is set to one, since new blocks with problems should be copied and transferred to a new data node on HDFS when disk failure and data loss occur. If the block replication factor is set to more than two, the process delay for performing fault tolerance does not occur. In addition, this performance degradation is caused by rescheduling job tasks in the master node in order to cope with recovering system failures. (a) Fig. 12. (a) Total image transcoding time and total video transcoding time (b) versus data set size for various values of block replication factor (b)

18 2844 Kim et al.: A Hadoop-based Multimedia Transcoding System for Processing Social Media Although a similar performance is provided in both modules when the block size option value is set to more than three, we highly recommend that the block replication factor to be set to three, since many block replicas result in an unnecessary waste of storage space and more processing time for copying and transferring a large amounts of blocks. 6.5 Comparing two versions of video transcoding module To compare the transcoding performance of our Hadoop-based approach with that of the traditional parallel processing approach, we implemented two versions of the video transcoding module running on the same cluster. The first version is our Hadoop-based transcoding module, whereas the other version is the traditional frame-based parallel transcoding module. We exploit the Media Encoding Cluster [35] for the traditional approach. The Media Encoding Cluster written in C and C++ is the first open source project that deals with frame-based encoding in a distributed and parallel manner in commodity hardware to reduce the encoding time for a file. That is, to encode original media files into target files, our Hadoop-based transcoding approach splits media files into fixed blocks, while the Media Encoding Cluster splits media files into frame units. We test and compare both approaches using the same video data sets. In case of the Hadoop-based version, we use the default Hadoop options explained in the experiment environment section. Table 6 lists the total transcoding time of the two versions with a speedup calculation that is different from the speedup used in Section 6.2. The speedup used in this section shows how many times the Hadoop-based transcoding executions are faster than the traditional parallel-based transcoding executions. Speedup is defined as: Speedup = traditional frame-based parallel transcoding time / Hadoop-based transcoding time. Table 6. Total transcoidng time results of two versions of video transcoding module Video data set size Traditional parallel frame-based transcoding time (s) Hadoop-based transcoding time (s) Speedup 1GB GB GB GB GB GB Fig. 13. Comparison of transcoding time between Hadoop-based trnascoding and Meida Encoding Cluster Module (frame-based) According to Table 6 and Fig. 13, the Hadoop-based transcoding module exhibits better

19 KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 6, NO. 11, Nov performance than the Media Encoding Cluster in terms of execution time in all the data sets. For instance, the total transcoding times for completing the transcoding process for 50 GB of the Hadoop-based transcoding module and the Media Encoding Cluster are approximately 3 h and 20 min and 27 min, respectively. From the speed up results, it is observed that the difference in performance between the two versions increases when the data set size increases. This means that there is much to be gained from our approach when processing data sets of larger sizes. The Media Encoding Cluster exhibits lower performance than our module because it involves greater overhead due to the steps for splitting and merging the original media files. Our approach splits original video files into 64 MB blocks and the blocks are merged after the transcoding process in MbTD, whereas the Media Encoding Cluster splits original video files into a significantly larger number of frame units compared to the Hadoop-based transcoding module s blocks and merges the chunked frames into target video files. In fact, in case of a 1GB data set (200 MB, 29 frames, 3 min 19 s), our module creates 20 chunked blocks of 64 MB, while the Media Encoding Cluster produces approximately chunked frames. 7. Conclusion and Future work In this paper, we briefly review our social media cloud computing service environment (SMCCSE) and a social media cloud computing PaaS platform. In order to implement social media transcoding functions for transcoding image and video content into a specific format according to user transcoding requirements, we proposed a Hadoop-based multimedia transcoding system in the PaaS platform of SMCCSE. In order to reduce the transcoding time and ensure transcoded image and video quality, we apply a HDFS and MapReduce framework to our system, which are emerging technologies in the cloud computing field. Our system overcomes the difficulties related to emerging and merging policies in distributed video processing and fault tolerance and load balancing management in large-scale distributed systems by obeying Hadoop policies. In the performance evaluation section, we focus on measuring the total transcoding time for various sets of experiments: (1) a change in cluster size for speedup of performance, (2) different Hadoop options with respect to block size (32, 64, 128, 256, 512MB), (3) different Hadoop options for different block replication factors (1, 2, 3, 4, 5). Through these experiments, we experimentally verified the excellent performance of our system in media transcoding processing and identified the ideal Hadoop options suitable for media transcoding processing. When the block size option is set to a value greater than or close to the original file size and the block replication factor value is set to three, our system delivers good performance for media transcoding processes. Moreover, in terms of transcoding execution time, our Hadoop-based transcoding approach implemented using Java exhibits better performance than the traditional frame-based parallel approach implemented using C and C++. In the future, we will plan to improve the advanced splitting and merging algorithms and a load balancing scheme specified in media transcoding processing in Hadoop. We will also implement distributed video transcoding streaming services optimized for our system. References [1] J. Dean and S. Ghemawat, MapReduce: Simplified data processing on large clusters, Communication of the ACM, vol.51, no.1, pp , Jan Article (CrossRef Link) [2] K. Shvachko, H. Kuang, S. Radia and R. Chansler, The Hadoop distributed file system, in Proc.

20 2846 Kim et al.: A Hadoop-based Multimedia Transcoding System for Processing Social Media of 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, pp.1-10, May Article (CrossRef Link) [3] S. Islam and J.-C.Gregoire, Giving users an edge: A flexible cloud model and its application for multimedia, Future Generation Computer Systems, vol.28, no.6, pp , Jun Article (CrossRef Link) [4] W. Gropp, et al., A high-performance, portable implementation of the MPI message passing interface standard, Parallel Computing, vol.22, no.6, pp , Sep Article (CrossRef Link) [5] M. Kim and H. Lee, SMCC: Social media cloud computing model for developing SNS based on social media, Communications in Computer and Information Science, vol.206, pp , Article (CrossRef Link) [6] G. Barlas, Cluster-based optimized parallel video transcoding, Parallel Computing, vol.38, no.4-5, pp , Apr Article (CrossRef Link) [7] I. Ahmad, X. Wei, Y. Sun and Y.-Q. Zhang, Video transcoding: An overview of various techniques and research issues, IEEE Transactions on Multimedia, vol.7, no.5, pp Oct Article (CrossRef Link) [8] Y.-K. Lee, et al., Customer requirements elicitation based on social network service, KSII Transactions on Internet and Information Systems, vol.5, no.10, pp , Oct Article (CrossRef Link) [9] S. Mirri, P. Salomoni and D. Pantieri, RMob: Transcoding rich multimedia contents through web services, in Proc. of 3rd IEEE Consumer Communications and Networking Conference, vol.2, pp , Jan Article (CrossRef Link) [10] Hadoop MapReduce project, [11] Z. Lei, Media transcoding for pervasive computing, in Proc. of 5th ACM Conf. on Multimedia, no4, pp , Oct Article (CrossRef Link) [12] D.M. Boyd and N.B. Ellison, Social network sites: Definition, history, and scholarship, Journal of Computer-Mediated Communication, vol.13, no.1, pp , Oct Article (CrossRef Link) [13] Apache Hadoop project, [14] C.L. Covle, and H. Vaughn, Social Networking: Communication revolution or evolution?, Bell Labs Technical Journal, vol.13, no.2, pp.13-17, Jun Article (CrossRef Link) [15] J. Guo, F. Chen and L. Bhuyan, R. Kumar, A cluster-based active router architecture supporting video / audio stream transcoding service, in Proc. of Parallel and Distributed Processing Symposium, Apr Article (CrossRef Link) [16] Y. Sambe, S. Watanabe, D. Yu, T. Nakamura, N. Wakamiya, High-speed distributed video transcoding for multiple rates and formats, IEICE Transactions on Information and Systems, vol.e88-d, no.8, pp , Aug Article (CrossRef Link) [17] M.-J. Kim, H. Lee, H. Lee, SMCCSE: PaaS Platform for processing large amounts of social media, in Proc. of the 3rd international Conf. on Internet, pp , Dec Article (CrossRef Link) [18] J. Shafer, S. Rixner and A.L. Cox, The Hadoop distributed file system: Balancing portability and performance, in Proc. of IEEE International Symposium on Performance Analysis of Systems and Software, pp , Mar Article (CrossRef Link) [19] Z. Tian, J, Xue, W. Hu, T. Xu and N. Zheng, High performance Cluster-based Transcoder, in Proc. of 2010 International Conf. on Computer Application and System Modeling, vol.2, pp , Oct Article (CrossRef Link) [20] R. Buyya, C. Yeo, S. Venugopal, J. Broberg, Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility, Future Generation Computer Systems, vol.25, no.6, pp , Jun Article (CrossRef Link) [21] R. Lammel, Google s MapReduce programing model - Revisited, Science of Computer Programming, vol.68, no.3, pp , Oct Article (CrossRef Link) [22] M.A. Vouk, Cloud Computing Issues, research and implementations, in Proc. of 30th International Conf. on Information Technology Interfaces, pp.31-40, Jun Article (CrossRef

Performance Optimization of a Distributed Transcoding System based on Hadoop for Multimedia Streaming Services

Performance Optimization of a Distributed Transcoding System based on Hadoop for Multimedia Streaming Services RESEARCH ARTICLE Adv. Sci. Lett. 4, 400 407, 2011 Copyright 2011 American Scientific Publishers Advanced Science Letters All rights reserved Vol. 4, 400 407, 2011 Printed in the United States of America

More information

METHOD OF A MULTIMEDIA TRANSCODING FOR MULTIPLE MAPREDUCE JOBS IN CLOUD COMPUTING ENVIRONMENT

METHOD OF A MULTIMEDIA TRANSCODING FOR MULTIPLE MAPREDUCE JOBS IN CLOUD COMPUTING ENVIRONMENT METHOD OF A MULTIMEDIA TRANSCODING FOR MULTIPLE MAPREDUCE JOBS IN CLOUD COMPUTING ENVIRONMENT 1 SEUNGHO HAN, 2 MYOUNGJIN KIM, 3 YUN CUI, 4 SEUNGHYUN SEO, 5 SEUNGBUM SEO, 6 HANKU LEE 1,2,3,4,5 Department

More information

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,

More information

Dynamic Resource Allocation And Distributed Video Transcoding Using Hadoop Cloud Computing

Dynamic Resource Allocation And Distributed Video Transcoding Using Hadoop Cloud Computing Dynamic Resource Allocation And Distributed Video Transcoding Using Hadoop Cloud Computing Shanthi.B.R 1, Prakash Narayanan.C 2 M.E, Department of CSE, P.S.V College of Engineering and Technology, Krishnagiri,

More information

Home Appliance Control and Monitoring System Model Based on Cloud Computing Technology

Home Appliance Control and Monitoring System Model Based on Cloud Computing Technology Home Appliance Control and Monitoring System Model Based on Cloud Computing Technology Yun Cui 1, Myoungjin Kim 1, Seung-woo Kum 3, Jong-jin Jung 3, Tae-Beom Lim 3, Hanku Lee 2, *, and Okkyung Choi 2 1

More information

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Efficient Data Replication Scheme based on Hadoop Distributed File System

Efficient Data Replication Scheme based on Hadoop Distributed File System , pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Research Article Hadoop-Based Distributed Sensor Node Management System

Research Article Hadoop-Based Distributed Sensor Node Management System Distributed Networks, Article ID 61868, 7 pages http://dx.doi.org/1.1155/214/61868 Research Article Hadoop-Based Distributed Node Management System In-Yong Jung, Ki-Hyun Kim, Byong-John Han, and Chang-Sung

More information

Big Data Processing with MapReduce for E-Book

Big Data Processing with MapReduce for E-Book Big Data Processing with MapReduce for E-Book Tae Ho Hong 2, Chang Ho Yun 1,2, Jong Won Park 1,2, Hak Geon Lee 2, Hae Sun Jung 1 and Yong Woo Lee 1,2 1 The Ubiquitous (Smart) City Consortium 2 The University

More information

A Study on Data Analysis Process Management System in MapReduce using BPM

A Study on Data Analysis Process Management System in MapReduce using BPM A Study on Data Analysis Process Management System in MapReduce using BPM Yoon-Sik Yoo 1, Jaehak Yu 1, Hyo-Chan Bang 1, Cheong Hee Park 1 Electronics and Telecommunications Research Institute, 138 Gajeongno,

More information

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk. Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,maheshkmaurya@yahoo.co.in

More information

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions

More information

IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource Virtualization

IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource Virtualization 2011 International Conference on Information and Electronics Engineering IPCSIT vol.6 (2011) (2011) IACSIT Press, Singapore IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource

More information

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems Aysan Rasooli Department of Computing and Software McMaster University Hamilton, Canada Email: rasooa@mcmaster.ca Douglas G. Down

More information

MapReduce and Hadoop Distributed File System

MapReduce and Hadoop Distributed File System MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) bina@buffalo.edu http://www.cse.buffalo.edu/faculty/bina Partially

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Cyber Forensic for Hadoop based Cloud System

Cyber Forensic for Hadoop based Cloud System Cyber Forensic for Hadoop based Cloud System ChaeHo Cho 1, SungHo Chin 2 and * Kwang Sik Chung 3 1 Korea National Open University graduate school Dept. of Computer Science 2 LG Electronics CTO Division

More information

Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan

Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan Abstract Big Data is revolutionizing 21st-century with increasingly huge amounts of data to store and be

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

Mobile Cloud Computing for Data-Intensive Applications

Mobile Cloud Computing for Data-Intensive Applications Mobile Cloud Computing for Data-Intensive Applications Senior Thesis Final Report Vincent Teo, vct@andrew.cmu.edu Advisor: Professor Priya Narasimhan, priya@cs.cmu.edu Abstract The computational and storage

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT

LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT Samira Daneshyar 1 and Majid Razmjoo 2 1,2 School of Computer Science, Centre of Software Technology and Management (SOFTEM),

More information

Keywords: Big Data, HDFS, Map Reduce, Hadoop

Keywords: Big Data, HDFS, Map Reduce, Hadoop Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning

More information

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT

More information

Prof.Deepak Gupta Computer Department, Siddhant College of Engineering Sudumbhare, Pune, Mahrashtra,India

Prof.Deepak Gupta Computer Department, Siddhant College of Engineering Sudumbhare, Pune, Mahrashtra,India Image data conversion module using Hadoop in cloud computing enviornment Prof.Deepak Gupta Computer Department, Siddhant College of Engineering Sudumbhare, Pune, Mahrashtra,India Ms.Vaishali Patil Computer

More information

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image

More information

Reduction of Data at Namenode in HDFS using harballing Technique

Reduction of Data at Namenode in HDFS using harballing Technique Reduction of Data at Namenode in HDFS using harballing Technique Vaibhav Gopal Korat, Kumar Swamy Pamu vgkorat@gmail.com swamy.uncis@gmail.com Abstract HDFS stands for the Hadoop Distributed File System.

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 1 PDA College of Engineering, Gulbarga, Karnataka, India rlrooparl@gmail.com 2 PDA College of Engineering, Gulbarga, Karnataka,

More information

On a Hadoop-based Analytics Service System

On a Hadoop-based Analytics Service System Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology

More information

Facilitating Consistency Check between Specification and Implementation with MapReduce Framework

Facilitating Consistency Check between Specification and Implementation with MapReduce Framework Facilitating Consistency Check between Specification and Implementation with MapReduce Framework Shigeru KUSAKABE, Yoichi OMORI, and Keijiro ARAKI Grad. School of Information Science and Electrical Engineering,

More information

Scalable Multiple NameNodes Hadoop Cloud Storage System

Scalable Multiple NameNodes Hadoop Cloud Storage System Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai

More information

Mining Large Datasets: Case of Mining Graph Data in the Cloud

Mining Large Datasets: Case of Mining Graph Data in the Cloud Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large

More information

An Open MPI-based Cloud Computing Service Architecture

An Open MPI-based Cloud Computing Service Architecture An Open MPI-based Cloud Computing Service Architecture WEI-MIN JENG and HSIEH-CHE TSAI Department of Computer Science Information Management Soochow University Taipei, Taiwan {wjeng, 00356001}@csim.scu.edu.tw

More information

CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING

CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING Journal homepage: http://www.journalijar.com INTERNATIONAL JOURNAL OF ADVANCED RESEARCH RESEARCH ARTICLE CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING R.Kohila

More information

Survey on Scheduling Algorithm in MapReduce Framework

Survey on Scheduling Algorithm in MapReduce Framework Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India

More information

Secret Sharing based on XOR for Efficient Data Recovery in Cloud

Secret Sharing based on XOR for Efficient Data Recovery in Cloud Secret Sharing based on XOR for Efficient Data Recovery in Cloud Computing Environment Su-Hyun Kim, Im-Yeong Lee, First Author Division of Computer Software Engineering, Soonchunhyang University, kimsh@sch.ac.kr

More information

Evaluating HDFS I/O Performance on Virtualized Systems

Evaluating HDFS I/O Performance on Virtualized Systems Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang xtang@cs.wisc.edu University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing

More information

Processing Large Amounts of Images on Hadoop with OpenCV

Processing Large Amounts of Images on Hadoop with OpenCV Processing Large Amounts of Images on Hadoop with OpenCV Timofei Epanchintsev 1,2 and Andrey Sozykin 1,2 1 IMM UB RAS, Yekaterinburg, Russia, 2 Ural Federal University, Yekaterinburg, Russia {eti,avs}@imm.uran.ru

More information

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. ravirajesh.j.2013.mecse@rajalakshmi.edu.in Mrs.

More information

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Scalable Cloud Computing Solutions for Next Generation Sequencing Data Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

More information

Mining Interesting Medical Knowledge from Big Data

Mining Interesting Medical Knowledge from Big Data IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 06-10 www.iosrjournals.org Mining Interesting Medical Knowledge from

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

Log Mining Based on Hadoop s Map and Reduce Technique

Log Mining Based on Hadoop s Map and Reduce Technique Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com

More information

New Cloud Computing Network Architecture Directed At Multimedia

New Cloud Computing Network Architecture Directed At Multimedia 2012 2 nd International Conference on Information Communication and Management (ICICM 2012) IPCSIT vol. 55 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V55.16 New Cloud Computing Network

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM Ramesh Maharjan and Manoj Shakya Department of Computer Science and Engineering Dhulikhel, Kavre, Nepal lazymesh@gmail.com,

More information

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Performance of

More information

A Framework for the Design of Cloud Based Collaborative Virtual Environment Architecture

A Framework for the Design of Cloud Based Collaborative Virtual Environment Architecture , March 12-14, 2014, Hong Kong A Framework for the Design of Cloud Based Collaborative Virtual Environment Architecture Abdulsalam Ya u Gital, Abdul Samad Ismail, Min Chen, and Haruna Chiroma, Member,

More information

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra

More information

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Vidya Dhondiba Jadhav, Harshada Jayant Nazirkar, Sneha Manik Idekar Dept. of Information Technology, JSPM s BSIOTR (W),

More information

MapReduce and Hadoop Distributed File System V I J A Y R A O

MapReduce and Hadoop Distributed File System V I J A Y R A O MapReduce and Hadoop Distributed File System 1 V I J A Y R A O The Context: Big-data Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) Google collects 270PB data in a month (2007), 20000PB

More information

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

marlabs driving digital agility WHITEPAPER Big Data and Hadoop marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil

More information

Keywords: Cloudsim, MIPS, Gridlet, Virtual machine, Data center, Simulation, SaaS, PaaS, IaaS, VM. Introduction

Keywords: Cloudsim, MIPS, Gridlet, Virtual machine, Data center, Simulation, SaaS, PaaS, IaaS, VM. Introduction Vol. 3 Issue 1, January-2014, pp: (1-5), Impact Factor: 1.252, Available online at: www.erpublications.com Performance evaluation of cloud application with constant data center configuration and variable

More information

Fault Tolerance in Hadoop for Work Migration

Fault Tolerance in Hadoop for Work Migration 1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous

More information

Detection of Distributed Denial of Service Attack with Hadoop on Live Network

Detection of Distributed Denial of Service Attack with Hadoop on Live Network Detection of Distributed Denial of Service Attack with Hadoop on Live Network Suchita Korad 1, Shubhada Kadam 2, Prajakta Deore 3, Madhuri Jadhav 4, Prof.Rahul Patil 5 Students, Dept. of Computer, PCCOE,

More information

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Jeffrey D. Ullman slides. MapReduce for data intensive computing Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of

More information

Hadoop on a Low-Budget General Purpose HPC Cluster in Academia

Hadoop on a Low-Budget General Purpose HPC Cluster in Academia Hadoop on a Low-Budget General Purpose HPC Cluster in Academia Paolo Garza, Paolo Margara, Nicolò Nepote, Luigi Grimaudo, and Elio Piccolo Dipartimento di Automatica e Informatica, Politecnico di Torino,

More information

A Hybrid Load Balancing Policy underlying Cloud Computing Environment

A Hybrid Load Balancing Policy underlying Cloud Computing Environment A Hybrid Load Balancing Policy underlying Cloud Computing Environment S.C. WANG, S.C. TSENG, S.S. WANG*, K.Q. YAN* Chaoyang University of Technology 168, Jifeng E. Rd., Wufeng District, Taichung 41349

More information

A Performance Analysis of Distributed Indexing using Terrier

A Performance Analysis of Distributed Indexing using Terrier A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search

More information

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

A Study on the Cloud Computing Architecture, Service Models, Applications and Challenging Issues

A Study on the Cloud Computing Architecture, Service Models, Applications and Challenging Issues A Study on the Cloud Computing Architecture, Service Models, Applications and Challenging Issues Rajbir Singh 1, Vivek Sharma 2 1, 2 Assistant Professor, Rayat Institute of Engineering and Information

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

A Prediction-Based Transcoding System for Video Conference in Cloud Computing

A Prediction-Based Transcoding System for Video Conference in Cloud Computing A Prediction-Based Transcoding System for Video Conference in Cloud Computing Yongquan Chen 1 Abstract. We design a transcoding system that can provide dynamic transcoding services for various types of

More information

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,

More information

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,

More information

Dynamic Resource Pricing on Federated Clouds

Dynamic Resource Pricing on Federated Clouds Dynamic Resource Pricing on Federated Clouds Marian Mihailescu and Yong Meng Teo Department of Computer Science National University of Singapore Computing 1, 13 Computing Drive, Singapore 117417 Email:

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:

More information

BSPCloud: A Hybrid Programming Library for Cloud Computing *

BSPCloud: A Hybrid Programming Library for Cloud Computing * BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,

More information

An Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment

An Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment An Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment Daeyong Jung 1, SungHo Chin 1, KwangSik Chung 2, HeonChang Yu 1, JoonMin Gil 3 * 1 Dept. of Computer

More information

Distributed Framework for Data Mining As a Service on Private Cloud

Distributed Framework for Data Mining As a Service on Private Cloud RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &

More information

Research on Job Scheduling Algorithm in Hadoop

Research on Job Scheduling Algorithm in Hadoop Journal of Computational Information Systems 7: 6 () 5769-5775 Available at http://www.jofcis.com Research on Job Scheduling Algorithm in Hadoop Yang XIA, Lei WANG, Qiang ZHAO, Gongxuan ZHANG School of

More information

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea Overview Riding Google App Engine Taming Hadoop Summary Riding

More information

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

Running head: CLOUD COMPUTING 1. Cloud Computing. Daniel Watrous. Management Information Systems. Northwest Nazarene University

Running head: CLOUD COMPUTING 1. Cloud Computing. Daniel Watrous. Management Information Systems. Northwest Nazarene University Running head: CLOUD COMPUTING 1 Cloud Computing Daniel Watrous Management Information Systems Northwest Nazarene University CLOUD COMPUTING 2 Cloud Computing Definition of Cloud Computing Cloud computing

More information

Cloud Computing based Livestock Monitoring and Disease Forecasting System

Cloud Computing based Livestock Monitoring and Disease Forecasting System , pp.313-320 http://dx.doi.org/10.14257/ijsh.2013.7.6.30 Cloud Computing based Livestock Monitoring and Disease Forecasting System Seokkyun Jeong 1, Hoseok Jeong 2, Haengkon Kim 3 and Hyun Yoe 4 1,2,4

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Discovery 2015: Cloud Computing Workshop June 20-24, 2011 Berkeley, CA Introduction to Cloud Computing Keith R. Jackson Lawrence Berkeley National Lab What is it? NIST Definition Cloud computing is a model

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Energy-Saving Cloud Computing Platform Based On Micro-Embedded System

Energy-Saving Cloud Computing Platform Based On Micro-Embedded System Energy-Saving Cloud Computing Platform Based On Micro-Embedded System Wen-Hsu HSIEH *, San-Peng KAO **, Kuang-Hung TAN **, Jiann-Liang CHEN ** * Department of Computer and Communication, De Lin Institute

More information

How To Analyze Log Files In A Web Application On A Hadoop Mapreduce System

How To Analyze Log Files In A Web Application On A Hadoop Mapreduce System Analyzing Web Application Log Files to Find Hit Count Through the Utilization of Hadoop MapReduce in Cloud Computing Environment Sayalee Narkhede Department of Information Technology Maharashtra Institute

More information

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING A SURVEY ON MAPREDUCE IN CLOUD COMPUTING Dr.M.Newlin Rajkumar 1, S.Balachandar 2, Dr.V.Venkatesakumar 3, T.Mahadevan 4 1 Asst. Prof, Dept. of CSE,Anna University Regional Centre, Coimbatore, newlin_rajkumar@yahoo.co.in

More information

The WAMS Power Data Processing based on Hadoop

The WAMS Power Data Processing based on Hadoop Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore The WAMS Power Data Processing based on Hadoop Zhaoyang Qu 1, Shilin

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

Viswanath Nandigam Sriram Krishnan Chaitan Baru

Viswanath Nandigam Sriram Krishnan Chaitan Baru Viswanath Nandigam Sriram Krishnan Chaitan Baru Traditional Database Implementations for large-scale spatial data Data Partitioning Spatial Extensions Pros and Cons Cloud Computing Introduction Relevance

More information

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis , 22-24 October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information