Integration of Hadoop Cluster Prototype and Analysis Software for SMB



Similar documents
Concept Design of Testbed based on Cloud Computing for Security Research

Cloud-based Distribute Processing of User-Customized Mobile Interface in U-Sensor Network Environment

Development of CEP System based on Big Data Analysis Techniques and Its Application

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

UPS battery remote monitoring system in cloud computing

On a Hadoop-based Analytics Service System

Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution

Scalable Multiple NameNodes Hadoop Cloud Storage System

Feasibility Study of Searchable Image Encryption System of Streaming Service based on Cloud Computing Environment

Comparison of Open Source Cloud System for Small and Medium Sized Enterprises

METHOD OF A MULTIMEDIA TRANSCODING FOR MULTIPLE MAPREDUCE JOBS IN CLOUD COMPUTING ENVIRONMENT

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

Research Article Hadoop-Based Distributed Sensor Node Management System

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Cloud Computing based Livestock Monitoring and Disease Forecasting System

Mobile Hybrid Cloud Computing Issues and Solutions

Hadoop Architecture. Part 1

A Database Hadoop Hybrid Approach of Big Data

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

Manifest for Big Data Pig, Hive & Jaql

International Journal of Innovative Research in Computer and Communication Engineering

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Processing Large Amounts of Images on Hadoop with OpenCV

Distributed Framework for Data Mining As a Service on Private Cloud

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL

Open source Google-style large scale data analysis with Hadoop

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

A Study on Data Analysis Process Management System in MapReduce using BPM

Agenda. Big Data. Dell Cloud Solutions A Dell Story Summary. Concepts Market Trends and Challenges Dell Solutions

Review of the Techniques for User Management System

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Big Data Using Cloud Computing

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud


An Approach to Implement Map Reduce with NoSQL Databases

Telecom Data processing and analysis based on Hadoop

Big Data on Cloud Computing- Security Issues

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Prototype Design of NFC-Based Electronic. Coupon Ecosystem with Object Memory Model

BIG DATA ANALYSIS USING RHADOOP

CLOUD STORAGE USING HADOOP AND PLAY

Big Data - Infrastructure Considerations

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Design of Big Data-based Greenhouse Environment Data Consulting System for Improving Crop Quality

On site big data analysis system model to promote the competiveness of manufacturing enterprises

International Journal of Advance Research in Computer Science and Management Studies

Enabling High performance Big Data platform with RDMA

BIG DATA-AS-A-SERVICE

IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

Performance Comparison Analysis of Linux Container and Virtual Machine for Building Cloud

Apache Hadoop new way for the company to store and analyze big data

A Study on Information Technology Plan and Status of University 2013

Constructing a Data Lake: Hadoop and Oracle Database United!

The Greenplum Analytics Workbench

A Study on Analysis and Implementation of a Cloud Computing Framework for Multimedia Convergence Services

Design of Electric Energy Acquisition System on Hadoop

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework

Introduction to Cloud Computing

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Chapter 7. Using Hadoop Cluster and MapReduce

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

A High-availability and Fault-tolerant Distributed Data Management Platform for Smart Grid Applications

Detection of Distributed Denial of Service Attack with Hadoop on Live Network

Hadoop Big Data for Processing Data and Performing Workload

Oracle Big Data SQL Technical Update

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

VOL. 5, NO. 2, August 2015 ISSN ARPN Journal of Systems and Software AJSS Journal. All rights reserved

Cloud Computing Where ISR Data Will Go for Exploitation

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Efficient Data Replication Scheme based on Hadoop Distributed File System

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

Success factors for the implementation of ERP to the Agricultural Products Processing Center

White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP

GeoGrid Project and Experiences with Hadoop

University Dedicated Next Generation ERP System

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

NTT DOCOMO Technical Journal. Large-Scale Data Processing Infrastructure for Mobile Spatial Statistics

Big Data Storage Architecture Design in Cloud Computing

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

CSE-E5430 Scalable Cloud Computing Lecture 2

MapReduce Job Processing

Home Appliance Control and Monitoring System Model Based on Cloud Computing Technology

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Autonomic Data Replication in Cloud Environment

A Cost Effective Virtual Cluster with Hadoop Framework for Big Data Analytics

BIG DATA IS MESSY PARTNER WITH SCALABLE

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Reverse Auction-based Resource Allocation Policy for Service Broker in Hybrid Cloud Environment

Click Stream Data Analysis Using Hadoop

Ubuntu and Hadoop: the perfect match

Transcription:

Vol.58 (Clound and Super Computing 2014), pp.1-5 http://dx.doi.org/10.14257/astl.2014.58.01 Integration of Hadoop Cluster Prototype and Analysis Software for SMB Byung-Rae Cha 1, Yoo-Kang Ji 2, Jong-Won Kim 3 1,3 School of Information and Communication GIST, Republic of KOREA 1 brcha@nm.gist.ac.kr, 3 jongwon@gist.ac.kr 2 Huin Tech Co., Gwangju, Republic of KOREA neobacje@gmail.com Abstract. Recently, even to the small and medium business (SMB) companies, the booming adoption of Cloud Computing and Big Data paradigm has been becoming increasingly important. In this paper, in the context of private cloud infrastructure, we introduce our attempt to design the experimental costeffective prototypes for Hadoop cluster, together with its partial realization. Keywords: Big Data and Cloud, Private and personal cloud, Hadoop cluster, Prototyping and verification 1 Introduction The cloud and big data was selected as Gartner's top 10 strategic technology trends for 2012 and the personal cloud was newly added in 2013 (see Fig. 1). For SMB (Small and Medium Business) companies with limited ICT budget, it becomes necessary to construct a small-size private cloud cluster involving big data processing. In particular, three reasons for SMB s adoption of cloud computing were commented in one of 2009 SMB Survey by ENISA[1]. Fig.1. Gartner s top 10 strategic technologies (2010 2014). ISSN: 2287-1233 ASTL Copyright 2014 SERSC

In this paper, by considering the scale-out architecture with the commodity hardware parts, we design a basic-level prototype of Hadoop-enabled cluster for SMB. The designed prototype has following advantages. First, it can easily scale-out by adding computer parts for performance improvement. It also can handle various Hadoop-based tasks, supported by the open-source software modules. Finally, it can support high availability in the contexts of hardware, operation, and applications. 2 Background and Related Work With the trendy term, Big data, we refer to massive data that is beyond normallymanageable size with ordinary database software [2]. It is characterized 3V+1C attributes such as Volume - a wide range of large amounts of data, Velocity - quick speed of data production and flow, Variety - various forms of information, and Complexity - non-structured and complex data [3]. This computing/storage challenge of big-data analysis could be solved by leveraging the cloud computing methodology (i.e., tools). The cloud computing can provide us virtualized infrastructure resources in the form of IaaS, universal software-centric operation platform with PaaS, and/or creative user-customized services with SaaS. It has the advantage of reducing surplus resources for cloud computing providers, and of using necessary resources (or software) independently for end users. Hadoop, the software framework developed in Java language, is the Apache open-source project that allows Big-Data analysis in handling distributed processing [4]. Hadoop consists of HDFS and MapReduce. Although HDFS and MapReduce may physically coexist in a server, HDFS and MapReduce have master/slave architecture. For HDFS, the master is called as NameNode and the slave is called as DataNode. 3 Prototype Design of Hadoop-enabled Cluster The Hadoop-enabled cluster for SMB private cloud, as part of hybrid cloud environment shown in Fig. 2, could be applied to various big-data processing tasks such as LOD (Linked Open Data) [5], MIS (Management Information System), Mahout [6] for data mining, image processing [7, 8], StraaS (Streaming as a Service) [9] and others. Also, it has to provide resource scalability in both aspects of computing and storage. In this conceptual verification stage, the designed Hadoop-enabled cluster consists of three form-factors: basic-level version 0.1, 0.2, and 0.3. They are mainly differentiated by the cost-performance for SMB.As discussed above, we have designed the Hadoop cluster by PC form-factors as shown in Fig, 3, it consists of 4 PCs that serve as one NameNode, and three DataNodes for Hadoop. By removing the cover cases of PCs, as shown in Fig. 3, the prototype of Hadoop cluster combines multiple motherboards in a rack-mount form-factor. This re-organized prototype shows the space-saving connection of one NameNode and three DataNodes, whose motherboards consist of i3 CPU, 4GB RAM, and 320GB hard disk.the performance test is 2 Copyright 2014 SERSC

taken with the proposed Hadoop clusterprototype. The testing is performed with 11GB of US Airline NavigationStatistic Data published by ASA(American Standard Association). The proposed Hadoop cluster prototype showsaround 5~6 minutes performance (correct time: 5m 44s). Fig. 2.Hybrid cloud environment with the proposed scale-out cluster. Fig. 3.Hadoop cluster prototype. 4 Design and Implementation of Big-Data Analysis System In this chapter, we designed the analysis tools integrated and configured software for Big-Data analysis as shown in Fig. 4. AndFig. 5 ~ Fig. 9are presented the Input state of Big-Data, Big-Data in Folder, Convert UI from raw data to JSON data, Hadoop processing and results, and info-graph in web-browser by visualization tool D3. Specially, the convert UI in Fig. 6 supports the various data types of raw data, csv, and JSON in order to compatibility between data of various software tools. Fig. 4. Design of Big-Data analysis system. Fig. 5. UI of Input & Folder for Big-Data processing. Copyright 2014 SERSC 3

Fig. 6. Convert UI for Big-Data processing. Fig. 7. UI of Hadoop processing. Fig. 8. Display result of Hadoop processing. Fig. 9. UI of visualization tool D3. 5 Conclusion In this paper, we designed and prototyped a Hadoop-enabled cluster with nonexpensive commodity hardware parts for SMB. However, it only verifies the feasibility of low-cost small form-factor construction of private cloud. In future, it is highly desirable to explore the federation with any public cloud and to apply the DevOps (Development and Operations) tools to automatically configure it for the targeted bigdata processing tasks and info-graph for visualization. Acknowledgements. This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2012R1A1A2041274). References 1. ENISA survey, "An SMB perspective on cloud computing," 2009. 2. J. Manyika and M. Chui, Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, May 2011. 3. P. Russom, Big data analytics, TDWI Research Fourth Quarter, 2011. 4. Hadoop, http://hadoop.apache.org/ 5. LOD (Linked Open Data), http://www.data.gov/ 6. Mahout, http://mahout.apache.org/ 4 Copyright 2014 SERSC

7. HIPI, http://hipi.cs.virginia.edu/ 8. C.Sweeney, L. Liu, S. Arietta, and J. Lawrence, HIPI: A Hadoop image processing interface for image-based MapReduce tasks, B.S. Thesis. University of Virginia, 2011. 9. B. Cha, S. Park, and Y. Ji, Design of StraaS (Streaming as a Service) based on cloud computing, International Journal of Multimedia and Ubiquitous Engineering, vol. 7, no. 4, Oct. 2012. Copyright 2014 SERSC 5