Open Access Research on Database Massive Data Processing and Mining Method based on Hadoop Cloud Platform



Similar documents
UPS battery remote monitoring system in cloud computing


On Cloud Computing Technology in the Construction of Digital Campus

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *

Big Data Storage Architecture Design in Cloud Computing

The WAMS Power Data Processing based on Hadoop

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

Telecom Data processing and analysis based on Hadoop

Journal of Chemical and Pharmaceutical Research, 2015, 7(3): Research Article. E-commerce recommendation system on cloud computing

Design of Electric Energy Acquisition System on Hadoop

The Improved Job Scheduling Algorithm of Hadoop Platform

Optimization of Distributed Crawler under Hadoop

The Construction of Seismic and Geological Studies' Cloud Platform Using Desktop Cloud Visualization Technology

Modeling for Web-based Image Processing and JImaging System Implemented Using Medium Model

Research on Trust Management Strategies in Cloud Computing Environment

The Study of a Hierarchical Hadoop Architecture in Multiple Data Centers Environment

Research on Storage Techniques in Cloud Computing

Applied research on data mining platform for weather forecast based on cloud storage

CONCEPTUAL MODEL OF MULTI-AGENT BUSINESS COLLABORATION BASED ON CLOUD WORKFLOW

Research of Data Mining Algorithm Based on Cloud Database

Advances in Natural and Applied Sciences

Research on the UHF RFID Channel Coding Technology based on Simulink

Analysis of Information Management and Scheduling Technology in Hadoop

Open Access Research and Design for Mobile Terminal-Based on Smart Home System

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

SURVEY ON SCIENTIFIC DATA MANAGEMENT USING HADOOP MAPREDUCE IN THE KEPLER SCIENTIFIC WORKFLOW SYSTEM

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL

Exploration on Security System Structure of Smart Campus Based on Cloud Computing. Wei Zhou

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Open Access Design of a Python-based Wireless Network Optimization and Testing System

Study on the Evaluation for the Knowledge Sharing Efficiency of the Knowledge Service Network System in Agile Supply Chain

Method of Fault Detection in Cloud Computing Systems

Scalable Multiple NameNodes Hadoop Cloud Storage System

An Hadoop-based Platform for Massive Medical Data Storage

NoSQL and Hadoop Technologies On Oracle Cloud

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.

A Network Simulation Experiment of WAN Based on OPNET

Mechanical Analysis of Crossbeam in a Gantry Machine Tool and its Deformation Compensation

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

Design of Electronic Medical Record System Based on Cloud Computing Technology

The Application Research of Ant Colony Algorithm in Search Engine Jian Lan Liu1, a, Li Zhu2,b

Efficient Data Replication Scheme based on Hadoop Distributed File System

A Scheme for Implementing Load Balancing of Web Server

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Research Article Hadoop-Based Distributed Sensor Node Management System

Chapter 7. Using Hadoop Cluster and MapReduce

Map/Reduce Affinity Propagation Clustering Algorithm

Query and Analysis of Data on Electric Consumption Based on Hadoop

HadoopRDF : A Scalable RDF Data Analysis System

Dell Reference Configuration for Hortonworks Data Platform

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

Memory Database Application in the Processing of Huge Amounts of Data Daqiang Xiao 1, Qi Qian 2, Jianhua Yang 3, Guang Chen 4

Study on Redundant Strategies in Peer to Peer Cloud Storage Systems

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

International Journal of Advance Research in Computer Science and Management Studies

Evaluating Task Scheduling in Hadoop-based Cloud Systems

How To Analyze The Time Varying And Asymmetric Dependence Of International Crude Oil Spot And Futures Price, Price, And Price Of Futures And Spot Price

Hadoop Scheduler w i t h Deadline Constraint

Data Security Strategy Based on Artificial Immune Algorithm for Cloud Computing

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

An efficient Mapreduce scheduling algorithm in hadoop R.Thangaselvi 1, S.Ananthbabu 2, R.Aruna 3

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Open Access Numerical Analysis on Mutual Influences in Urban Subway Double-Hole Parallel Tunneling

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Survey on Scheduling Algorithm in MapReduce Framework

Capability Service Management System for Manufacturing Equipments in

The Design and Application of Water Jet Propulsion Boat Weibo Song, Junhai Jiang3, a, Shuping Zhao, Kaiyan Zhu, Qihua Wang

A Method of Cloud Resource Load Balancing Scheduling Based on Improved Adaptive Genetic Algorithm

Shareability and Locality Aware Scheduling Algorithm in Hadoop for Mobile Cloud Computing

Detection of Distributed Denial of Service Attack with Hadoop on Live Network

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

E-R Method Applied to Design the Teacher Information Management System s Database Model

Load Balancing of virtual machines on multiple hosts

Performance Analysis of Book Recommendation System on Hadoop Platform

Open Access A Facial Expression Recognition Algorithm Based on Local Binary Pattern and Empirical Mode Decomposition

Optimization of PID parameters with an improved simplex PSO

Wireless Sensor Networks Coverage Optimization based on Improved AFSA Algorithm

Design call center management system of e-commerce based on BP neural network and multifractal

Task Scheduling in Hadoop

Apache Hadoop new way for the company to store and analyze big data

ANALYSING THE FEATURES OF JAVA AND MAP/REDUCE ON HADOOP

Big Data: Study in Structured and Unstructured Data

Mining of Web Server Logs in a Distributed Cluster Using Big Data Technologies

Transcription:

Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1463-1467 1463 Open Access Research on Database Massive Data Processing and Mining Method based on Hadoop Cloud Platform Zhao Xiaoyong * and Yang Chunrong Mathematics and Computer Science Institute, XinYu University, JiangXi, 338004, China Abstract. This paper establishes the massive data processing mathematical model and algorithm of cloud computing, and the Hadoop distributed computing method is introduced to the database management system, to realize the automatic partition database data and master-slave node set. The master-slave nodes distributed algorithm is complied by using the MATLAB software realizing the data distributed computing function, and through the numerical simulation, we can compute the data processing speed, transmission rate, capacity and other system parameters. Compared with the Hadoop distributed processing algorithms and two kinds of traditional data processing algorithm, we can find that the data processing speed of Hadoop distributed computing algorithm is faster than the general algorithm, the amount of information storage, information transmission speed, which can satisfy the need of high data processing. Keywords: Hadoop; Cloud computing; Massive data; Large capacity; Distributed computing; Master slave node 1. INTRODUCTION With the development of computer internet communication computing, communication system needs to deal with a very large data. For the massive data processing, the server will consume a large amount of computer resources, and many traditional computing platforms cannot complete the mass data processing [1,2]. Combined with the calculation function of Hadoop distributed, a cloud computing data processing platform of high-speed processing mass data is designed by using cloud computing and cloud storage technology [3]. At the same time, combined with master-slave nodes automatic assignment, the platform uses the automatically partitioned database to realize a high data capacity and transmission speed data processing, which provides a new scheme for the massive data processing algorithms and data communication technology. 2. OVERVIEW OF HADOOP CLOUD PLATFORM DATABASE MASSIVE DATA PROCESSING AND MINING METHODS The evolution process of the personalized internet causes a massive data, and the traditional single super server in the face of massive data has gradually fallen short, so the processing massive data has become a thorny problem [4]. The characteristics of open source Hadoop cloud platform developed by Apache foundation research brings endless possibilities for the huge amount of data processing methods, this paper uses the Hadoop data processing platform and combines with distributed data processing algorithm to establish the massive database processing data platform, the main structure is shown in Fig. (1). Fig. (1) shows the schematic diagram of Hadoop massive data processing, it can be seen from the chart that the data processing of massive data Hadoop mainly consists allocation computing algorithm design and master-slave distributed, these two techniques can successfully achieve high speed processing of massive database data. Massive database Hadoop database processing algorithm Distributed computing method Master-slave nodes distribution Completing data processing Fig. (1). Hadoop massive data processing. 1874-4443/14 2014 Bentham Open

1464 The Open Automation and Control Systems Journal, 2014, Volume 6 Xiaoyong and Chunrong Part1 Part2 Part3 Part4 Part5 Reduce Reduce Fig. (2). Massive data cloud processing Hadoop platform. 3. DESIGN OF DATABASE MASSIVE DATA PRO- CESSING AND DATA MINING MATHEMATICAL MODEL Massive image processing belongs to a large image data processing, and it not only has rather high processing requirements, but also is very harsh on the storage space requirements, if we use the general storage space, it is in difficult to meet the image processing capacity requirements [5, 6]. This paper adopts Hadoop distributed processing algorithm, the data space is partitioned to realize high-speed mass data processing. The database the data is divided into [w i,w i+1 ], the establishment of two-dimensional database Hadoop data processing function is shown in formula (1).! # " $# M(m,n) = x 1 + x 2 m + x 3 n + x 4 mn N(m,n) = y 1 + y 2 m + y 3 n + y 4 mn Assumed that memory allocation value is E T =!" E 1, E 2, E 3, E 4 # $ Then formula (1) and formula (2) can be written in matrix form: e(m,n) =! x. (3) "! = # $ Among them,! 0 0! (1) (2) % &'. (4) For each storage node, it is Z = ZR. (5) Assuming the generalized coordinate R is F = H!1 x. (6) is We can get the distributed node calculation formula that X (x, y) = a(x, y)x. (7) In order to realize the cloud computing and distributed computing, the master-slave nodes distributed computing program is set by using MATLAB software, the main procedures are as follows [7]: function dy=vdp100000(t,y) dy=zeros(2,1); dy(1)=y(2); dy(2)=1000*(1-y(1)^2)*y(2)-y(1); t0=0 tf=8000 t0=0 tf=8000 function dy=rigid(t,y) dy=zeros(3,1); dy(1)=y(2)*y(3); dy(2)=-y(1)*y(3); dy(3)=-0 51*y(1)*y(2); t0=0 tf=12 [T,Y]=ode45('rigid',[0 12],[0 1 1]); plot(t,y(:,1),'-',t,y(:,2),'*',t,y(:,3),'+') 4. DESIGN OF HADOOP CLOUD PLATFORM DA- TABASE MASSIVE DATA PROCESSING SYSTEM In order to verify the validity and reliability of massive data processing mathematical model and the algorithm in second section, this paper uses MATLAB software to carry on simulation for the massive data processing, and designs the Hadoop cloud data processing platform, in which the main platform frame is shown in Fig. (2). As shown in Fig. (2), the Hadoop cloud computing platform is mainly composed of three technologies, including distributed parallel computing framework Reduce,

Research on Database Massive Data Processing The Open Automation and Control Systems Journal, 2014, Volume 6 1465 Browser Hadoop utility tool Secondary JVM JobTracker Win Operating system Server Fig. (3). The schematic diagram of Hadoop master node design. Secondary JobTracker JVM Linux Operating system Server Fig. (4). The schematic diagram of Hadoop slave node design. Fig. (5). The schematic diagram of Hadoop and Hbase running. distributed database system HDFS and distributed file system Hbase. As shown in Fig. (3), the master-slave node design is the main calculation mode of Hadoop implementation distributed computing and data storage, which has a master node and multiple slave nodes in each database cluster, and the master node running daemon includes, Secondary and JobTracker. As shown in Fig. (4), there are many salvee nodes in the Hadoop massive data cloud processing system, which can realize the data distributed processing function [8]. In the slave node running, the daemon is DataNode and Task- Tracker. Fig. (5) shows the Hadoop and HBase run simultaneously, it can layout Hbase information and data partition, this

1466 The Open Automation and Control Systems Journal, 2014, Volume 6 Xiaoyong and Chunrong Fig. (6). Master node opening. Fig. (7). The massive data processing computing time simulation curve. paper uses the browser Web log to deal with massive data, the master node opening is shown in Fig. (6). Fig. (6) shows the schematic diagram Master node opening, it is the key of implementing Hadoop database massive data processing [9,10]. In order to verify the validity and reliability of the system, this paper goes through simulation that can obtain the calculation performance table as shown in Table 1. Table 1 shows the simulation parameters of the cloud massive data processing Hadoop platform [11]. It can be seen from the table that Hadoop data processing calculation can achieve a higher level, in which the minimum value of data calculation is 0.001Mb/s, the information capacity of the minimum value is 100Gb and the speed is 1800 frames /s, they all meet the needs of design. In order to compare different algorithms on the processing effect of the massive data, this paper selects two kinds of traditional algorithm and Hadoop cloud processing algorithm to carry on comparison, the amount of data is from the initial dozens of million to several hundreds of megabytes [12, 13]. As shown in Fig. (7), the MATLAB numerical simulation calculation can be found that the computing speed of Hadoop cloud processing computing platform is significantly higher than two traditional algorithms, especially when the data is reached 105 level, data processing speed will be faster, it can save several times data processing time, which is a kind of high efficient data processing algorithms. Table 1. The massive data processing cloud platform simulation parameters. Main Simulation Parameters Numerical Calculated data Min Mb/s 0.001 Calculated data Max Mb/s 0.01 Information capacity Min Gb 100 Information capacity Max Gb 100000 5. SUMMARY Speed (frame /s) 1800 Massive image processing technology requires very high for the processor and memory, which need to adopt high performance processor and large capacity memory, because

Research on Database Massive Data Processing The Open Automation and Control Systems Journal, 2014, Volume 6 1467 the traditional single or single core processing and ordinary memory already can not satisfy the need of image processing. In this paper, cloud computing function is introduced to mass data processing system, and then through the Hadoop distributed computing function, combined with the automatic distribution method of master slave node, the high speed data processing can be realized. Finally, compared with the Hadoop distributed processing algorithms and two kinds of traditional data processing algorithms, the data processing speed of Hadoop distributed algorithms is faster and its information capacity is also higher, it is a massive data processing algorithm, which provides a new method for the study of large data, high capacity and high transmission rate data treatment scheme. CONFLICT OF INTEREST The authors confirm that this article content has no conflicts of interest. ACKNOWLEDGEMENTS Declared none. REFERENCES [1] Wang Wei, Jiang Lina. Research on cloud computing security requirements analysis [J]. Information network security, 2012(1): 35-37. [2] Sun Fuquan, Zhang Dawei, Cheng Xu, Liu Chao. Construction of enterprise private cloud storage platform based on Hadoop [J]. Journal of Liaoning Technical University, 2011(3): 112-113. [3] Zhou Zichen, Shen Zhenning. The power supply integrity design method in high speed embedded system [J]. Microcontroller and embedded application, 2010(3): 35-37. [4] Kuang Shenghui, Li Bo. Analysis of cloud computing architecture and its application case [J]. Computer and digital engineering, 2011, 3(6):61-63. [5] Xin Jun, Chen Kang, Zheng Weimin. Research on virtual cluster management technology [J]. Computer science and exploration, 2011, 4(23):325-327. [6] Yu Fei. Research on data preprocessing technology in Web log mining [J]. Computer technology and development, 2011, 20(5): 47-50. [7] Jinhai. On cloud [J]. China Institute of computer communication, 2012, 5(6): 22-25. [8] Wu Jiyi, Ping Lingdi, Pan Xuezeng, Li Zhuo. Cloud Computing: from concept to platform [J]. Telecom science, 2011,12 (4): 25-27. [9] Xue Zhiqiang, Liu Peng, Wen AI. Research on distributed file system management strategy [J]. Computer knowledge Technique, 2011(1): 112-113. [10] Li Cheng, Cheng Xiaoyu. Analysis of high-speed DSP signal integrity simulation based on Hyperlynx [J]. Electronic devices, 2011, 33(2): 45-47. [11] Zhu Zhijun, Le Zichun, Zhu ran. Research and implementation of OBS network simulation platform based on NS2 [J]. Journal of communication, 2012, 30(9): 128-134. [12] Zhang Jian. The concept of cloud computing and its influence analysis [J]. Telecommunication technology, 2012, 1(12):15-16. [13] Chen Quan, Deng Qianni. Cloud computing and its key technology [J]. Computer application, 2011, 29 (9):263-264. Received: September 16, 2014 Revised: December 23, 2014 Accepted: December 31, 2014 Xiaoyong and Chunrong; Licensee Bentham Open. This is an open access article licensed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited. Xiaoyong and Chunrong