A Study on Data Analysis Process Management System in MapReduce using BPM
|
|
|
- Charlene Green
- 10 years ago
- Views:
Transcription
1 A Study on Data Analysis Process Management System in MapReduce using BPM Yoon-Sik Yoo 1, Jaehak Yu 1, Hyo-Chan Bang 1, Cheong Hee Park 1 Electronics and Telecommunications Research Institute, 138 Gajeongno, Yuseong-gu, Daejeon, , Korea Dept. of Computer Science and Engineering Chungnam National University, 0 Gung-dong, Yuseong-gu, Daejeon, , Korea 1 {ys5315, dbzzang, bangs}@etri.re.kr, [email protected] Abstract. MapReduce is a distribution-system-based programming model to process massive data and has been utilized as an analysis model not only in the academic world but also in the industrial fields. However, developers who implement MapReduce have some deficiency in understanding the data analysis, while data analysts have difficulty in programming MapReduce for various analyses by themselves. Hence, it is difficult for developers to provide a demanded analysis output. In order to solve such difficulty between developers of MapReduce and the data analysts, this study proposes a new MapReduce analysis process management system based on BPM (Business Process Management). This system was designed to provide a mutual complimentary intermediary function for MapReduce developers and analysts, and also makes it possible to respond flexibly to any alteration of analysis procedure. Keywords: MapReduce, BPM, Analysis system, Data processing 1 Introduction MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of realworld tasks [1], []. MapReduce has spread widely through Apache group s open source project, Hadoop [3]. Hadoop was distributed by posting an environment to initiates the MapReduce functions on the HDFS that implemented GFS (Google File System) [4] as an open source [5]. However, it is not easy for data analysts to understand and program such MapReduce framework suitably to their analysis purpose. On the contrary, it is difficult for the program developers to get into the analysis domain because of the difficulties in understanding the data properties fundamentally, utilizing analysis methods efficiently, and interpreting the results. In order to solve such problems, this paper utilizes BPM(Business Process Management) Corresponding author SUComS 013, ASTL Vol. 6, pp. 7-1, 013 SERSC 013 7
2 Proceedings, The 4th International Conference on Security-enriched Urban Computing and Smart Grid where data interworking between systems and flow control according to out-put data results are possible [6], [7]. BPM s process modeling procedure is as follows. First, it defines the application for interworking with the system in advance. Next, a process modeler defines the activities corresponding to each procedure. After that, it performs the work of establishing the procedural flow between activities and mapping the application to each activity suitably. BPM engine is a system that performs the initiation work of the processes defined through such modeling. MapReduce analysis process management system newly proposed in this paper is so designed as to perform MapReduce job in a BPM application to utilize a BPM engine. In addition, it is designed to make it possible to process an intelligent data analysis by controlling the conditions between diversified MapReduce jobs. The implementation of such architecture enables MapReduce application and analysis process to be loosely coupled so that it can be applied to any flexible alteration of analysis procedure. Efficient data refinement and transmission are also possible by utilizing BPM in order to perform any MapReduce job scheduled. The paper is organized as follows. In Section, we describe MapReduce and the BPM system. In Section 3, the proposed MapReduce analysis process management system is presented. Finally, Section 4 discusses the conclusions and future research directions. Related Work MapReduce is a programming model and an associated implementation for processing and generating large data sets [1]. MapReduce automatically parallelizes and executes the program on a large cluster of commodity machines. The runtime system takes care of the details of partitioning the input data, scheduling the program s execution across a set of machines, handling machine failures, and managing required inter-machine communication. MapReduce allows programmers with no experience in parallel and distributed systems to easily utilize the resources of a large distributed system. Typical MapReduce computation processes many terabytes of data on hundreds or thousands of machines. Programmers find the system easy to use, and more than 100,000 MapReduce jobs are executed on Google s clusters every day []. Conceptually the map and reduce functions supplied by a user have associated types as follows. map reduce ( k, v ) list ( k, v ) 1 1 ( ( )) ( ) k, list v list v. (1) That is, the input keys and values are drawn from a different domain than the output keys and values. Furthermore, the intermediate keys and values are from the same domain as the output keys and values. Many people consider BPM (Business Process Management) to be the next step after the workflow wave of the nineties. Therefore, we use workflow terminology to 8
3 A Study on Data Analysis Process Management System in MapReduce using BPM define BPM. BPM includes methods, techniques, and tools to support the design, enactment, management, and analysis of operational business processes [6], [7], etc. In the last couple of years, many researchers and practitioners started to realize that the traditional focus on enactment is too restrictive. As a result new terms like BPM have been coined. There exist many definitions of BPM, but in most cases it clearly includes WFM (Workflow Management). Note that this definition restricts BPM to operational processes. In other words, processes at the strategic level or processes that cannot be made explicit are excluded. Fig. 1 shows the relationship between WFM and BPM by using the BPM lifecycle [6]. The BPM lifecyle describes the various phases in support of operational business processes. In the configuration phase, designs are implemented by configuring a process aware information system. After configuration, the enactment phase starts where the operational business processes are executed using the system configured. In the diagnosis phase, the operational processes are analyzed to identify problems and to find things that can be improved. The focus of traditional workflow management is on the lower half of the BPM lifecyle. As a result there is little support for the diagnosis phase. Moreover, analysis and real design support are missing. It is remarkable that few WFM systems support simulation, verification, and validation of process designs. It is also remarkable that few systems support the collection and interpretation of real-time data. Note that most WFM systems record data of process tasks. However, no tools to support any form of diagnosis are offered by the traditional systems. diagnosis process enactment system configuration process design Workflow Management Business Process Management Fig. 1. The BPM lifecyle to compare Workflow Management and Business Process Management 3 System Architecture BPM-based analysis process modeling system consists of 3 layers as shown in Fig.. 1) Data Storage Layer: This is a Hadoop-based physical data storage space. It can include a legacy system to provide the initial data of an analysis or store/provide intermediate data. ) MapReduce Application Layer: In this layer, there exists the implementation of MapReduce job to be performed in the Data Storage Layer. In addition, there exists the implementation of legacy system s interworking interface. 9
4 Proceedings, The 4th International Conference on Security-enriched Urban Computing and Smart Grid 3) Analysis Process Layer: This is the layer where the process initiates and controls the applications provided in MapReduce Application Layer. This makes it possible to control the conditions by providing variables for each application. Fig.. System Architecture Layer: Data storage layer, MapReduce application layer, Analysis process layer. The processing procedure for the definition of the MapReduce analysis process is shown in Fig. 3. The detailed execution procedure is as follows: 1) The MapReduce application developer implements various MapReduce functions to provide services. ) The implemented MapReduce applications shall be registered in BPM s MapReduce application repository. 3) Data analysis process modeler finds a suitable MapReduce application and mapping on activities of analysis process. 4) Data process modeler makes a modeling of various analysis processes. 5) Data process modeler registers analysis process definition to the BPM s analysis process repository, and makes a scheduling on time to initiate the process or define the rules. 6) BPM produces process instances by initiating the modeled process definitions after defining the input values of the initial variables. 7) Analysis process instances initiate the MapReduce application which has been mapped when modeling. In the application called, an actual analysis work is performed by initiating the defined MapReduce job. The analyzed data shall be utilized as input data for the next analysis stage. 10
5 A Study on Data Analysis Process Management System in MapReduce using BPM Fig. 3. Define and initiate procedure of MapReduce analysis process: 1) ~ ) MapReduce application development and register. 3) ~ 5) Analysis process modeling and register. 6) ~ 7) Analysis process initiate and running. 4 Conclusion In this paper, BPM was applied as a mutual intermediary system between MapReduce developers who have no knowledge about data analysis and data analysts who have no experience of program implementation. MapReduce analysis process management system newly proposed in this paper is so designed as to perform MapReduce job in a BPM application utilizing a BPM engine. In addition, it is designed to make it possible to process an intelligent data analysis by controlling the conditions between diversified MapReduce jobs. The implementation of such architecture enables MapReduce application and analysis process to be loosely coupled so that it can be applied to any flexible alteration of analysis procedure. Utilizing the characteristics of BPM, it is possible to extend to various services, for example, transmitting analysis results to legacy system besides MapReduce, refining data and accumulate them in RDBMS, and interworking with user work system, etc. In the future, it is intended to open the implementation of MapReduce job in a service format, register its contents and develop more expanded process modeling system by applying the analysis system of SOA/ESB [8] architecture that can be utilized by the combination of analysis service. 11
6 Proceedings, The 4th International Conference on Security-enriched Urban Computing and Smart Grid Acknowledgments. This work was supported by the IT R&D program of MKE/KEIT (Project No , Development of Semantic based Open USN Service Platform). References 1. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the USENIX Symposium on Operating Systems Design & Implementation (OSDI), pp (004). Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. J. Comm. of the ACM. 53, 1, (010) 3. White, T.: Hadoop: The Definitive Guide. O Reilly, Sebastopol (009) 4. Ghemawat, S., Gobioff, H., Leung, S.: The Google File System. In: Symposium on Operating Systems Principles, pp (003) 5. Hadoop Distributed file system, 6. Wil, M. P., Arthur, H. M., Mathias, W.: Business Process Management: A survey. In: Lecture Notes in Computer Science, LNCS, vol. 678, pp (003) 7. Jung, J., Kong, J., Park, J.: Service Integration Toward Ubiquitous Business Process Management. In: IEEE International Conference on Industrial Engineering and Engineering Management, IEEM 008, pp (008) 8. Chappell, D.A.: Enterprise Service Bus. O Reilly, Sebastopol (004) 1
Large-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
Design of Media measurement and monitoring system based on Internet of Things
Design of Media measurement and monitoring system based on Internet of Things Hyunjoong Kang 1, Marie Kim 1, MyungNam Bae 1, Hyo-Chan Bang 1, 1 Electronics and Telecommunications Research Institute, 138
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
What is Analytic Infrastructure and Why Should You Care?
What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group [email protected] ABSTRACT We define analytic infrastructure to be the services,
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,[email protected]
UPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
Keywords: Big Data, HDFS, Map Reduce, Hadoop
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning
A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs In a Workflow Application
2012 International Conference on Information and Computer Applications (ICICA 2012) IPCSIT vol. 24 (2012) (2012) IACSIT Press, Singapore A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs
Cyber Forensic for Hadoop based Cloud System
Cyber Forensic for Hadoop based Cloud System ChaeHo Cho 1, SungHo Chin 2 and * Kwang Sik Chung 3 1 Korea National Open University graduate school Dept. of Computer Science 2 LG Electronics CTO Division
R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5
Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 [email protected], 2 [email protected],
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
Log Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, [email protected] Amruta Deshpande Department of Computer Science, [email protected]
Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework
Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Vidya Dhondiba Jadhav, Harshada Jayant Nazirkar, Sneha Manik Idekar Dept. of Information Technology, JSPM s BSIOTR (W),
Development of CEP System based on Big Data Analysis Techniques and Its Application
, pp.26-30 http://dx.doi.org/10.14257/astl.2015.98.07 Development of CEP System based on Big Data Analysis Techniques and Its Application Mi-Jin Kim 1, Yun-Sik Yu 1 1 Convergence of IT Devices Institute
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Research Article Hadoop-Based Distributed Sensor Node Management System
Distributed Networks, Article ID 61868, 7 pages http://dx.doi.org/1.1155/214/61868 Research Article Hadoop-Based Distributed Node Management System In-Yong Jung, Ki-Hyun Kim, Byong-John Han, and Chang-Sung
Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan
Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan Abstract Big Data is revolutionizing 21st-century with increasingly huge amounts of data to store and be
Hadoop Technology for Flow Analysis of the Internet Traffic
Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet
Fault Tolerance in Hadoop for Work Migration
1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous
Efficient Data Replication Scheme based on Hadoop Distributed File System
, pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,
Distributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
Cloud Computing based on the Hadoop Platform
Cloud Computing based on the Hadoop Platform Harshita Pandey 1 UG, Department of Information Technology RKGITW, Ghaziabad ABSTRACT In the recent years,cloud computing has come forth as the new IT paradigm.
Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
Cloud Computing based Livestock Monitoring and Disease Forecasting System
, pp.313-320 http://dx.doi.org/10.14257/ijsh.2013.7.6.30 Cloud Computing based Livestock Monitoring and Disease Forecasting System Seokkyun Jeong 1, Hoseok Jeong 2, Haengkon Kim 3 and Hyun Yoe 4 1,2,4
The Performance Characteristics of MapReduce Applications on Scalable Clusters
The Performance Characteristics of MapReduce Applications on Scalable Clusters Kenneth Wottrich Denison University Granville, OH 43023 [email protected] ABSTRACT Many cluster owners and operators have
Supporting the BPM lifecycle with FileNet
Supporting the BPM lifecycle with FileNet Mariska Netjes Hajo A. Reijers Wil. M.P. van der Aalst Outline Introduction Evaluation approach Evaluation of FileNet Conclusions Business Process Management Supporting
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
Big Data and Hadoop with components like Flume, Pig, Hive and Jaql
Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.
marlabs driving digital agility WHITEPAPER Big Data and Hadoop
marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil
How To Analyze Log Files In A Web Application On A Hadoop Mapreduce System
Analyzing Web Application Log Files to Find Hit Count Through the Utilization of Hadoop MapReduce in Cloud Computing Environment Sayalee Narkhede Department of Information Technology Maharashtra Institute
Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster
Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster Amresh Kumar Department of Computer Science & Engineering, Christ University Faculty of Engineering
Exploring the Efficiency of Big Data Processing with Hadoop MapReduce
Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Brian Ye, Anders Ye School of Computer Science and Communication (CSC), Royal Institute of Technology KTH, Stockholm, Sweden Abstract.
Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
Big Application Execution on Cloud using Hadoop Distributed File System
Big Application Execution on Cloud using Hadoop Distributed File System Ashkan Vates*, Upendra, Muwafaq Rahi Ali RPIIT Campus, Bastara Karnal, Haryana, India ---------------------------------------------------------------------***---------------------------------------------------------------------
The Hadoop Framework
The Hadoop Framework Nils Braden University of Applied Sciences Gießen-Friedberg Wiesenstraße 14 35390 Gießen [email protected] Abstract. The Hadoop Framework offers an approach to large-scale
Business Process Management: A personal view
Business Process Management: A personal view W.M.P. van der Aalst Department of Technology Management Eindhoven University of Technology, The Netherlands [email protected] 1 Introduction Business
Secret Sharing based on XOR for Efficient Data Recovery in Cloud
Secret Sharing based on XOR for Efficient Data Recovery in Cloud Computing Environment Su-Hyun Kim, Im-Yeong Lee, First Author Division of Computer Software Engineering, Soonchunhyang University, [email protected]
METHOD OF A MULTIMEDIA TRANSCODING FOR MULTIPLE MAPREDUCE JOBS IN CLOUD COMPUTING ENVIRONMENT
METHOD OF A MULTIMEDIA TRANSCODING FOR MULTIPLE MAPREDUCE JOBS IN CLOUD COMPUTING ENVIRONMENT 1 SEUNGHO HAN, 2 MYOUNGJIN KIM, 3 YUN CUI, 4 SEUNGHYUN SEO, 5 SEUNGBUM SEO, 6 HANKU LEE 1,2,3,4,5 Department
On a Hadoop-based Analytics Service System
Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology
Analysis and Modeling of MapReduce s Performance on Hadoop YARN
Analysis and Modeling of MapReduce s Performance on Hadoop YARN Qiuyi Tang Dept. of Mathematics and Computer Science Denison University [email protected] Dr. Thomas C. Bressoud Dept. of Mathematics and
Task Scheduling Algorithm for Map Reduce To Control Load Balancing In Big Data
Task Scheduling Algorithm for Map Reduce To Control Load Balancing In Big Data Ms.N.Saranya, M.E., (CSE), Jay Shriram Group of Institutions, Tirupur. [email protected] Abstract- Load balancing is biggest
Data-Aware Service Choreographies through Transparent Data Exchange
Institute of Architecture of Application Systems Data-Aware Service Choreographies through Transparent Data Exchange Michael Hahn, Dimka Karastoyanova, and Frank Leymann Institute of Architecture of Application
Home Appliance Control and Monitoring System Model Based on Cloud Computing Technology
Home Appliance Control and Monitoring System Model Based on Cloud Computing Technology Yun Cui 1, Myoungjin Kim 1, Seung-woo Kum 3, Jong-jin Jung 3, Tae-Beom Lim 3, Hanku Lee 2, *, and Okkyung Choi 2 1
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Big Data Processing with MapReduce for E-Book
Big Data Processing with MapReduce for E-Book Tae Ho Hong 2, Chang Ho Yun 1,2, Jong Won Park 1,2, Hak Geon Lee 2, Hae Sun Jung 1 and Yong Woo Lee 1,2 1 The Ubiquitous (Smart) City Consortium 2 The University
Snapshots in Hadoop Distributed File System
Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any
Introduction to Hadoop
Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction
Optimization and analysis of large scale data sorting algorithm based on Hadoop
Optimization and analysis of large scale sorting algorithm based on Hadoop Zhuo Wang, Longlong Tian, Dianjie Guo, Xiaoming Jiang Institute of Information Engineering, Chinese Academy of Sciences {wangzhuo,
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
Mining Interesting Medical Knowledge from Big Data
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 06-10 www.iosrjournals.org Mining Interesting Medical Knowledge from
Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution
, pp. 93-102 http://dx.doi.org/10.14257/ijseia.2015.9.7.10 Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution Mi-Jin Kim and Yun-Sik
Integration of Hadoop Cluster Prototype and Analysis Software for SMB
Vol.58 (Clound and Super Computing 2014), pp.1-5 http://dx.doi.org/10.14257/astl.2014.58.01 Integration of Hadoop Cluster Prototype and Analysis Software for SMB Byung-Rae Cha 1, Yoo-Kang Ji 2, Jong-Won
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris
Scalable Multiple NameNodes Hadoop Cloud Storage System
Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai
Mobile Cloud Computing for Data-Intensive Applications
Mobile Cloud Computing for Data-Intensive Applications Senior Thesis Final Report Vincent Teo, [email protected] Advisor: Professor Priya Narasimhan, [email protected] Abstract The computational and storage
MapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
Introduction to Hadoop
1 What is Hadoop? Introduction to Hadoop We are living in an era where large volumes of data are available and the problem is to extract meaning from the data avalanche. The goal of the software tools
Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759
A token-based authentication security scheme for Hadoop distributed file system using elliptic curve cryptography
J Comput Virol Hack Tech (2015) 11:137 142 DOI 10.1007/s11416-014-0236-5 ORIGINAL PAPER A token-based authentication security scheme for Hadoop distributed file system using elliptic curve cryptography
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop
Handling Big(ger) Logs: Connecting ProM 6 to Apache Hadoop
Handling Big(ger) Logs: Connecting ProM 6 to Apache Hadoop Sergio Hernández 1, S.J. van Zelst 2, Joaquín Ezpeleta 1, and Wil M.P. van der Aalst 2 1 Department of Computer Science and Systems Engineering
A SURVEY ON MAPREDUCE IN CLOUD COMPUTING
A SURVEY ON MAPREDUCE IN CLOUD COMPUTING Dr.M.Newlin Rajkumar 1, S.Balachandar 2, Dr.V.Venkatesakumar 3, T.Mahadevan 4 1 Asst. Prof, Dept. of CSE,Anna University Regional Centre, Coimbatore, [email protected]
White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP
White Paper Big Data and Hadoop Abhishek S, Java COE www.marlabs.com Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP Table of contents Abstract.. 1 Introduction. 2 What is Big
Virtual Machine Based Resource Allocation For Cloud Computing Environment
Virtual Machine Based Resource Allocation For Cloud Computing Environment D.Udaya Sree M.Tech (CSE) Department Of CSE SVCET,Chittoor. Andra Pradesh, India Dr.J.Janet Head of Department Department of CSE
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
ANALYSIS OF SMART METER DATA USING HADOOP
ANALYSIS OF SMART METER DATA USING HADOOP 1 Balaji K. Bodkhe, 2 Dr. Sanjay P. Sood MESCOE Pune, CDAC Mohali Email: 1 [email protected], 2 [email protected] Abstract The government agencies and the
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. [email protected] Mrs.
Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O) Volume 1 Issue 3 (September 2014)
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE N.Alamelu Menaka * Department of Computer Applications Dr.Jabasheela Department of Computer Applications Abstract-We are in the age of big data which
Comparative analysis of Google File System and Hadoop Distributed File System
Comparative analysis of Google File System and Hadoop Distributed File System R.Vijayakumari, R.Kirankumar, K.Gangadhara Rao Dept. of Computer Science, Krishna University, Machilipatnam, India, [email protected]
Apache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
Big Data Analysis using Hadoop components like Flume, MapReduce, Pig and Hive
Big Data Analysis using Hadoop components like Flume, MapReduce, Pig and Hive E. Laxmi Lydia 1,Dr. M.Ben Swarup 2 1 Associate Professor, Department of Computer Science and Engineering, Vignan's Institute
CS246: Mining Massive Datasets Jure Leskovec, Stanford University. http://cs246.stanford.edu
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2 CPU Memory Machine Learning, Statistics Classical Data Mining Disk 3 20+ billion web pages x 20KB = 400+ TB
MapReduce (in the cloud)
MapReduce (in the cloud) How to painlessly process terabytes of data by Irina Gordei MapReduce Presentation Outline What is MapReduce? Example How it works MapReduce in the cloud Conclusion Demo Motivation:
Survey on Load Rebalancing for Distributed File System in Cloud
Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university
2. Research and Development on the Autonomic Operation. Control Infrastructure Technologies in the Cloud Computing Environment
R&D supporting future cloud computing infrastructure technologies Research and Development on Autonomic Operation Control Infrastructure Technologies in the Cloud Computing Environment DEMPO Hiroshi, KAMI
Data-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan [email protected] Abstract Every day, we create 2.5 quintillion
Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Big Data for Investment Research Management
IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable
Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment
Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,
Jeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
Hadoop on a Low-Budget General Purpose HPC Cluster in Academia
Hadoop on a Low-Budget General Purpose HPC Cluster in Academia Paolo Garza, Paolo Margara, Nicolò Nepote, Luigi Grimaudo, and Elio Piccolo Dipartimento di Automatica e Informatica, Politecnico di Torino,
Approaches for parallel data loading and data querying
78 Approaches for parallel data loading and data querying Approaches for parallel data loading and data querying Vlad DIACONITA The Bucharest Academy of Economic Studies [email protected] This paper
http://www.paper.edu.cn
5 10 15 20 25 30 35 A platform for massive railway information data storage # SHAN Xu 1, WANG Genying 1, LIU Lin 2** (1. Key Laboratory of Communication and Information Systems, Beijing Municipal Commission
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
