Autonomic Data Replication in Cloud Environment

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Autonomic Data Replication in Cloud Environment"

Transcription

1 International Journal of Electronics and Computer Science Engineering 38 Available Online at ISSN Autonomic Data Replication in Cloud Environment Dhananjaya Gupt, Mrs.Anju Bala Computer Science and Engineering, Thapar University, Patiala, India Abstract-- Cloud computing is an emerging practice that offers more flexibility in infrastructure and reduces cost than our traditional computing models. Cloud providers offer everything from access to raw storage capacity resources to complete application services. The services that are provided by the cloud can be accessed from anywhere and data flows from one place to another. Since data is moving via network, there are chances of data loss. So we need to keep multiple copies of data and thus data replication is one of the main issues in cloud computing. In this paper we have implemented automatic replication of data from local host to cloud environment. Data replication is implemented by using HADOOP which stores the data at various nodes. If one node goes down then data can be retrieved from other node seamlessly. Keywords - Cloud Computing, Fault tolerance, Data Replication. I.INTRODUCTION Cloud computing is an emerging practice that offers more flexibility in infrastructure and reduces cost than our traditional computing models. Cloud computing software frameworks manage cloud resources and provide scalable and fault tolerant computing utilities with globally uniform and hardware-transparent user interfaces [1]. The cloud provider takes the responsibility of managing the infrastructural issues. These days, Cloud providers offer everything from access to raw storage capacity resources to complete application services in many areas such as payroll and customer relationship management etc. Data flows through the network from one location to another while using the services provided by the cloud. Thus it becomes critical task to secure and maintain copies of data as it flows through the network. There are fault tolerance techniques available that replicates data at different location to tolerate data losses and ensures continued service. Replication is a key mechanism to achieve scalability, availability and fault-tolerance. It can be used to create and maintain copies of data at different sites [13]. When events affecting a primary location where the data resides occur, data can be recovered from the secondary location to provide continued service, fault tolerance, higher availability. Though it s a performance overhead as it takes time to recover data from other sites and restart the service again but fault can be tolerated and availability can be increased. The aim of our research wok is to implement the data replication from local machine to cloud environment. In this paper we implement data replication from local machine to cloud environment. Hadoop has been used to replicate data on different site. Hadoop which is an Apache project; all components are available via the Apache open source license. Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets [2].

2 Autonomic Data Replication in Cloud Environment 39 Figure-1: HDFS Architecture [4]. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes [4]. The rest of the paper is organized as follows: section II describes some work related to our research and challenges in replicated environment. Section III shows the Hybrid Virtualized Architecture. Section IV includes the implementation of this Architecture in which data is replicated on two different sites using Hadoop s HDFS and the experimental results. Section V concludes the paper. II.RELATED WORK As mentioned in the introduction data flows in the network then it becomes critical to secure and maintain data at multiple sites so that if there is any data loss then it could be recovered without much overhead. Persistent data stored in distributed file systems ranges in size from small to large, is likely read multiple times, and is typically long-lived. In comparison, intermediate data generated in cloud programming paradigms has uniquely contrasting characteristics [6]. There are many fault tolerance techniques available that deals with virtual machine (VM) migration, process migration, application migration to overcome the impact of fault [9][10]. Time series based precopy technique migrate VM from one source host to target host [11]. Data in form of pages is transferred in this approach. Proactive Live Process migration mechanism [12] migrate process before the fault occurs. To tolerate fault, migration would take place which involves large amount of data transfer. This is a performance hit as time would be consumed in data transfer and during that period system would be unavailable. Performance and Availability can be increased if data is placed at more than one site. Replication is one of the most widely studied phenomena in a distributed environment [13]. Replication is a strategy in which multiple copies of some data are stored at multiple sites. When required, data is fetched from the nearest available replica to avoid delay and increase performance. Availability needs to be high in cloud computing paradigm which makes replication of data in cloud environment, a challenging task. Difficulty in providing efficient and correct wide area database replication is that it requires integrating different techniques from several fields including distributed system, databases, network protocols and operating sytem [16]. Data replication schemes over storage providers with a KVS (key-value store) interface are inherently more difficult to realize than replication schemes over providers with richer interfaces [15]. Following are the few challenges in replicated environment: Data Consistency: Maintaining data integrity and consistency in a replicated environment is of prime importance. High precision applications may require strict consistency (e.g. 1SR) of the updates made by transactions [14]. Downtime during new replica creation: If strict data consistency is to be maintained, performance is severely affected if a new replica is to be created. As sites will not be able to fulfill requests due to consistency requirements.

3 IJECSE,Volume2, Number 2 Dhananjaya Gupt and Mrs. Anju Bala 40 Maintenance overhead: If the files are replicated at more than one site, it occupies storage space and it has to be administered. Thus, there are overheads in storing multiple files. Lower write performance: Performance of write operations can be dramatically lower in applications requiring high updates in replicated environment, because the transaction may need to update multiple copies [14]. III.HYBRID VIRTUALIZED ARCHITECTURE Figure-2: Hybrid Virtualized Architecture Figure 2 shows Hybrid Virtualized Architecture which includes Virtual Machine Workstation (VMware Fusion) to provide virtualization, Hadoop framework to provide HDFS functionality. The Eclipse is used as Java Integrated Development Environment to write application code. The virtual environment helped us to analyze the cloud environment for different types of application on a single machine. This is a mater-slave architecture where the master node provides the functionality to the slave node by providing fault tolerance as if one node is failed the data can be retrieved from the other slave nodes. The cluster is setup between both the machines. Virtualized Hybrid Architecture consists of hosting server installing VMware and two hosted VMs (master and slave) on which an Ubuntu OS. Following are the components of the Hybrid Architecture: Local Machine: An Application is executed on the local machine running on windows 7 32-bit platform. This application is developed using java Eclipse API. The data generated by this application is sent to HDFS which stores it on multiple locations. While performing experiments, we kept replication factor to be 2. Application access the file system using the HDFS client, that exports the HDFS file system interface. Master VM: It runs over Ubuntu platform. Hadoop is set up on this VM. NameNode and the DataNode runs on this VM. Data is received by the NameNode and replicated to the DataNodes depending on the replication factor. To increase reliability and availability the replication factor can be increased. NameNode keeps track of which DataNode is live so that when one DataNode is down data can be fetched from the other one. Slave VM: It runs over the same Ubuntu platform. Same Hadoop is set up on this VM also and it runs the second Data node. This node receives data from the master VM. Whenever Data node on the master VM fails then Name node automatically fetches data from this data node. IV.IMPLEMENTATION AND EXPERIMETAL RESULTS We have implemented an application in java using Eclipse API. Data required by the application is retrieved from HDFS. Operations are performed on data by retrieving it on the local machine. When transactions are done, data is sent to the HDFS. We have implemented Asynchronous mechanism to replicate data in virtualized cloud environment. In this Architecture, Hadoop is set up on both the virtual machines. On first machine we create NameNode and DataNode. On second machine we create only Data node. Hadoop is configured is have one NameNode and two DataNodes. Data generated by the application is pumped into HDFS where it is replicated on two DataNodes. When data is required it is retrieved from the HDFS. At that moment if either of the DataNodes

4 Autonomic Data Replication in Cloud Environment 41 fails, data is automatically recovered from the other one. The experimental platforms and software packages used in this system are as follows: Table-1: Platform Configuration TYPE Processor CPU Speed Memory Platform SPECIFICATION Intel i5 2.5GHz 4.00 GB 32-bit OS Operating System Ubuntu Table-2: Software Package versions SOFTWARE PACKAGES VERSIONS JDK 1.6 Eclipse Hadoop VMware Fusion Hadoop provides a web interface for statistics. Using this interface we can have status of all the nodes. Based on some failure cases, we are able to determine how our data is fetched from the other nodes when the requested node goes down. Case 1: Figure-3 shows the overall status monitoring of the system. Here we can browse through the system to determine which Data nodes are live. In this case, both the Data nodes are live and data can be retrieved from any of these nodes depending on the HDFS. The number of live and dead nodes is determined through this interface. Case 2: Figure-4 shows the detail of all the data nodes upon which the data is present. Data is automatically retrieved from the nearest node available. Case 3: Figure-5 shows the access of files stored in HDFS via any of the Data nodes. Data is replicated on two different nodes. If one node goes down then other node retrieves data and increases availability and fault tolerance.

5

6 IJECSE,Volume2, Number 2 Dhananjaya Gupt and Mrs. Anju Bala 42 Figure-3 web interface of the Name Node Figure-4: accessing live Data Node containing data

7 Autonomic Data Replication in Cloud Environment 43 Figure-5: accessing data present at the Data Node. V.CONCLUSION AND FUTURE SCOPE In this paper, we have proposed cloud virtualized system architecture based on Hadoop. We have presented highly reliable system that provides data replication in a cloud virtualized environment. Data is replicated on multiple VMs. An application is developed and executed. Experimental results are obtained, that validate the system fault tolerance and replication at multiple nodes. When one node fails then data is recovered via other node. Some future extensions are possible as performance can be improved by replicating data in the real time with higher replication factor to ensure much higher availability and fault tolerance. This data replication mechanism can be combined with other Fault Tolerance techniques to achieve more reliability and Fault Tolerance. REFERENCES [1] Application Architecture for Cloud Computing, white paper, [2] Apache Hadoop. [3] Golam Moktader Nayeem, Mohammad Jahangir Alam, Analysis of Different Software Fault Tolerance Techniques, [4] HDFS (hadoop distributed file system) architecture, design.html, [5] Alain Tchana, Laurent Broto, Daniel Hagimont, Fault Tolerant Approaches in Cloud Computing Infrastructures, The Eighth International Conference on Autonomic and Autonomous System, ICAS [6] Steven Y. Ko, Imranul Hoque, Brian Cho and Indranil Gupta, On Availability of Intermediate Data in Cloud Computations, [7] Geoffroy Vallee, Kulathep Charoenpornwattana, Christian Engelmann, Anand Tikotekar, Stephen L. Scott, A Framework for Proactive Fault Tolerance. [8] Julia Myint, Thinn Thu Naing, Management of Data Replication for PC Cluster-based Cloud Storage System, International Journal on Cloud Computing: Services and Architecture (IJCCSA), Vol.1, No.3, 31-41, November [9] Chao Wang1, Frank Mueller, Christian Engelmann, Stephen L. Scott, Proactive Process-Level Live Migration in HPC Environments, [10] Gang Chen, Hai Jin, Deqing Zou, Bing Bing Zhou, Weizhong Qiang, Gang Hu, SHelp: Automatic Self- healing for Multiple Application Instances in a Virtual Machine Environment, IEEE International Conference on Cluster Computing, [11] Bolin Hu, Zhou Lei, Yu Lei, Dong Xu, Jiandun Li, A Time-Series Based Precopy Approach for Live Migration of Virtual Machines, IEEE 17th International Conference on Parallel and Distributed Systems, [12] Chao Wang, Frank Mueller, Christian Engelmann, Proactive process level live migration and back migration in HPC environments, [13] Sushant Goel, Rajkumar Buyya, data replication strategies in wide area distributed systems. [14] Yu, H., and Vahdat, A. Consistent and automatic replica regeneration. Trans. Storage 1, 1 (2005), [15] Christian Cachin, Birgit Junker, Alessandro Sorniotti, On Limitations of Using Cloud Storage for Data Replication. [16] Yair Amir, Claudiu Danilov, Michal Miskin-Amir, Jonathan Stanton, Ciprian Tutu, Practical Wide-Area Database Replication,CNDS Johns Hopkins University,

Fault Tolerance- Challenges, Techniques and Implementation in Cloud Computing

Fault Tolerance- Challenges, Techniques and Implementation in Cloud Computing www.ijcsi.org 288 Fault Tolerance- Challenges, Techniques and Implementation in Cloud Computing Anju Bala 1, Inderveer Chana 2 1 Computer Science and Engineering Department, Thapar University Patiala-147004,

More information

MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM

MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM Julia Myint 1 and Thinn Thu Naing 2 1 University of Computer Studies, Yangon, Myanmar juliamyint@gmail.com 2 University of Computer

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services

More information

Dynamic Load Balancing: Improve Efficiency in Cloud Computing Argha Roy * M.Tech CSE Netaji Subhash Engineering College West Bengal, India.

Dynamic Load Balancing: Improve Efficiency in Cloud Computing Argha Roy * M.Tech CSE Netaji Subhash Engineering College West Bengal, India. Dynamic Load Balancing: Improve Efficiency in Cloud Computing Argha Roy * M.Tech CSE Netaji Subhash Engineering College West Bengal, India. Diptam Dutta M.Tech CSE Heritage Institute of Technology West

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 2, February 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Processing of Hadoop using Highly Available NameNode

Processing of Hadoop using Highly Available NameNode Processing of Hadoop using Highly Available NameNode 1 Akash Deshpande, 2 Shrikant Badwaik, 3 Sailee Nalawade, 4 Anjali Bote, 5 Prof. S. P. Kosbatwar Department of computer Engineering Smt. Kashibai Navale

More information

Dynamic Load Balancing of Virtual Machines using QEMU-KVM

Dynamic Load Balancing of Virtual Machines using QEMU-KVM Dynamic Load Balancing of Virtual Machines using QEMU-KVM Akshay Chandak Krishnakant Jaju Technology, College of Engineering, Pune. Maharashtra, India. Akshay Kanfade Pushkar Lohiya Technology, College

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Hadoop Scheduler w i t h Deadline Constraint

Hadoop Scheduler w i t h Deadline Constraint Hadoop Scheduler w i t h Deadline Constraint Geetha J 1, N UdayBhaskar 2, P ChennaReddy 3,Neha Sniha 4 1,4 Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore,

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Fault Tolerance in Hadoop for Work Migration

Fault Tolerance in Hadoop for Work Migration 1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

Fault Tolerance Techniques in Big Data Tools: A Survey

Fault Tolerance Techniques in Big Data Tools: A Survey on 21 st & 22 nd April 2014, Organized by Fault Tolerance Techniques in Big Data Tools: A Survey Manjula Dyavanur 1, Kavita Kori 2 Asst. Professor, Dept. of CSE, SKSVMACET, Laxmeshwar-582116, India 1,2

More information

Efficient Cloud Management for Parallel Data Processing In Private Cloud

Efficient Cloud Management for Parallel Data Processing In Private Cloud 2012 International Conference on Information and Network Technology (ICINT 2012) IPCSIT vol. 37 (2012) (2012) IACSIT Press, Singapore Efficient Cloud Management for Parallel Data Processing In Private

More information

http://www.paper.edu.cn

http://www.paper.edu.cn 5 10 15 20 25 30 35 A platform for massive railway information data storage # SHAN Xu 1, WANG Genying 1, LIU Lin 2** (1. Key Laboratory of Communication and Information Systems, Beijing Municipal Commission

More information

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra

More information

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,maheshkmaurya@yahoo.co.in

More information

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud) Open Cloud System (Integration of Eucalyptus, Hadoop and into deployment of University Private Cloud) Thinn Thu Naing University of Computer Studies, Yangon 25 th October 2011 Open Cloud System University

More information

Cloud Computing Simulation Using CloudSim

Cloud Computing Simulation Using CloudSim Cloud Computing Simulation Using CloudSim Ranjan Kumar #1, G.Sahoo *2 # Assistant Professor, Computer Science & Engineering, Ranchi University, India Professor & Head, Information Technology, Birla Institute

More information

HDFS Users Guide. Table of contents

HDFS Users Guide. Table of contents Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9

More information

Big Data Storage Architecture Design in Cloud Computing

Big Data Storage Architecture Design in Cloud Computing Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,

More information

HDFS Architecture Guide

HDFS Architecture Guide by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5

More information

BIG DATA USING HADOOP

BIG DATA USING HADOOP + Breakaway Session By Johnson Iyilade, Ph.D. University of Saskatchewan, Canada 23-July, 2015 BIG DATA USING HADOOP + Outline n Framing the Problem Hadoop Solves n Meet Hadoop n Storage with HDFS n Data

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud Aditya Jadhav, Mahesh Kukreja E-mail: aditya.jadhav27@gmail.com & mr_mahesh_in@yahoo.co.in Abstract : In the information industry,

More information

CLOUD STORAGE USING HADOOP AND PLAY

CLOUD STORAGE USING HADOOP AND PLAY 27 CLOUD STORAGE USING HADOOP AND PLAY Devateja G 1, Kashyap P V B 2, Suraj C 3, Harshavardhan C 4, Impana Appaji 5 1234 Computer Science & Engineering, Academy for Technical and Management Excellence

More information

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability

More information

Survey on Scheduling Algorithm in MapReduce Framework

Survey on Scheduling Algorithm in MapReduce Framework Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,

More information

Implementation of Reliable Fault Tolerant Data Storage System over Cloud using Raid 60

Implementation of Reliable Fault Tolerant Data Storage System over Cloud using Raid 60 International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-2 E-ISSN: 2347-2693 Implementation of Reliable Fault Tolerant Data Storage System over Cloud using Raid

More information

Apache Hadoop new way for the company to store and analyze big data

Apache Hadoop new way for the company to store and analyze big data Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File

More information

Distributed Framework for Data Mining As a Service on Private Cloud

Distributed Framework for Data Mining As a Service on Private Cloud RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &

More information

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

marlabs driving digital agility WHITEPAPER Big Data and Hadoop marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil

More information

GeoGrid Project and Experiences with Hadoop

GeoGrid Project and Experiences with Hadoop GeoGrid Project and Experiences with Hadoop Gong Zhang and Ling Liu Distributed Data Intensive Systems Lab (DiSL) Center for Experimental Computer Systems Research (CERCS) Georgia Institute of Technology

More information

Log Mining Based on Hadoop s Map and Reduce Technique

Log Mining Based on Hadoop s Map and Reduce Technique Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Final Project Proposal. CSCI.6500 Distributed Computing over the Internet

Final Project Proposal. CSCI.6500 Distributed Computing over the Internet Final Project Proposal CSCI.6500 Distributed Computing over the Internet Qingling Wang 660795696 1. Purpose Implement an application layer on Hybrid Grid Cloud Infrastructure to automatically or at least

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A COMPREHENSIVE VIEW OF HADOOP ER. AMRINDER KAUR Assistant Professor, Department

More information

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,

More information

Apache Hadoop 1.0 High Availability Solution on VMware vsphere TM

Apache Hadoop 1.0 High Availability Solution on VMware vsphere TM Apache Hadoop 1.0 High Availability Solution on VMware vsphere TM Reference Architecture TECHNICAL WHITE PAPER v 1.0 June 2012 Table of Contents Executive Summary... 3 Introduction... 3 Terminology...

More information

An Hadoop-based Platform for Massive Medical Data Storage

An Hadoop-based Platform for Massive Medical Data Storage 5 10 15 An Hadoop-based Platform for Massive Medical Data Storage WANG Heng * (School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876) Abstract:

More information

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Hadoop Distributed File System. Dhruba Borthakur June, 2007 Hadoop Distributed File System Dhruba Borthakur June, 2007 Goals of HDFS Very Large Distributed File System 10K nodes, 100 million files, 10 PB Assumes Commodity Hardware Files are replicated to handle

More information

HadoopRDF : A Scalable RDF Data Analysis System

HadoopRDF : A Scalable RDF Data Analysis System HadoopRDF : A Scalable RDF Data Analysis System Yuan Tian 1, Jinhang DU 1, Haofen Wang 1, Yuan Ni 2, and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China {tian,dujh,whfcarter}@apex.sjtu.edu.cn

More information

High Performance Cluster Support for NLB on Window

High Performance Cluster Support for NLB on Window High Performance Cluster Support for NLB on Window [1]Arvind Rathi, [2] Kirti, [3] Neelam [1]M.Tech Student, Department of CSE, GITM, Gurgaon Haryana (India) arvindrathi88@gmail.com [2]Asst. Professor,

More information

Multilevel Communication Aware Approach for Load Balancing

Multilevel Communication Aware Approach for Load Balancing Multilevel Communication Aware Approach for Load Balancing 1 Dipti Patel, 2 Ashil Patel Department of Information Technology, L.D. College of Engineering, Gujarat Technological University, Ahmedabad 1

More information

Storage Architectures for Big Data in the Cloud

Storage Architectures for Big Data in the Cloud Storage Architectures for Big Data in the Cloud Sam Fineberg HP Storage CT Office/ May 2013 Overview Introduction What is big data? Big Data I/O Hadoop/HDFS SAN Distributed FS Cloud Summary Research Areas

More information

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce. An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce. Amrit Pal Stdt, Dept of Computer Engineering and Application, National Institute

More information

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011 BookKeeper Flavio Junqueira Yahoo! Research, Barcelona Hadoop in China 2011 What s BookKeeper? Shared storage for writing fast sequences of byte arrays Data is replicated Writes are striped Many processes

More information

Distributed Metadata Management Scheme in HDFS

Distributed Metadata Management Scheme in HDFS International Journal of Scientific and Research Publications, Volume 3, Issue 5, May 2013 1 Distributed Metadata Management Scheme in HDFS Mrudula Varade *, Vimla Jethani ** * Department of Computer Engineering,

More information

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING Exploration on Service Matching Methodology Based On Description Logic using Similarity Performance Parameters K.Jayasri Final Year Student IFET College of engineering nishajayasri@gmail.com R.Rajmohan

More information

Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics

Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics Dharmendra Agawane 1, Rohit Pawar 2, Pavankumar Purohit 3, Gangadhar Agre 4 Guide: Prof. P B Jawade 2

More information

A Cost-Evaluation of MapReduce Applications in the Cloud

A Cost-Evaluation of MapReduce Applications in the Cloud 1/23 A Cost-Evaluation of MapReduce Applications in the Cloud Diana Moise, Alexandra Carpen-Amarie Gabriel Antoniu, Luc Bougé KerData team 2/23 1 MapReduce applications - case study 2 3 4 5 3/23 MapReduce

More information

Minimize Response Time Using Distance Based Load Balancer Selection Scheme

Minimize Response Time Using Distance Based Load Balancer Selection Scheme Minimize Response Time Using Distance Based Load Balancer Selection Scheme K. Durga Priyanka M.Tech CSE Dept., Institute of Aeronautical Engineering, HYD-500043, Andhra Pradesh, India. Dr.N. Chandra Sekhar

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software Engineer, @MirantisIT

Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software Engineer, @MirantisIT Hadoop on OpenStack Cloud Dmitry Mescheryakov Software Engineer, @MirantisIT Agenda OpenStack Sahara Demo Hadoop Performance on Cloud Conclusion OpenStack Open source cloud computing platform 17,209 commits

More information

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications ECE6102 Dependable Distribute Systems, Fall2010 EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications Deepal Jayasinghe, Hyojun Kim, Mohammad M. Hossain, Ali Payani

More information

Variations in Performance and Scalability when Migrating n-tier Applications to Different Clouds

Variations in Performance and Scalability when Migrating n-tier Applications to Different Clouds Variations in Performance and Scalability when Migrating n-tier Applications to Different Clouds Deepal Jayasinghe, Simon Malkowski, Qingyang Wang, Jack Li, Pengcheng Xiong, Calton Pu Outline Motivation

More information

ISSN:2320-0790. Keywords: HDFS, Replication, Map-Reduce I Introduction:

ISSN:2320-0790. Keywords: HDFS, Replication, Map-Reduce I Introduction: ISSN:2320-0790 Dynamic Data Replication for HPC Analytics Applications in Hadoop Ragupathi T 1, Sujaudeen N 2 1 PG Scholar, Department of CSE, SSN College of Engineering, Chennai, India 2 Assistant Professor,

More information

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Hadoop Distributed File System. Jordan Prosch, Matt Kipps Hadoop Distributed File System Jordan Prosch, Matt Kipps Outline - Background - Architecture - Comments & Suggestions Background What is HDFS? Part of Apache Hadoop - distributed storage What is Hadoop?

More information

Group Based Load Balancing Algorithm in Cloud Computing Virtualization

Group Based Load Balancing Algorithm in Cloud Computing Virtualization Group Based Load Balancing Algorithm in Cloud Computing Virtualization Rishi Bhardwaj, 2 Sangeeta Mittal, Student, 2 Assistant Professor, Department of Computer Science, Jaypee Institute of Information

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

The Recovery System for Hadoop Cluster

The Recovery System for Hadoop Cluster The Recovery System for Hadoop Cluster Prof. Priya Deshpande Dept. of Information Technology MIT College of engineering Pune, India priyardeshpande@gmail.com Darshan Bora Dept. of Information Technology

More information

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Qloud Demonstration 15 319, spring 2010 3 rd Lecture, Jan 19 th Suhail Rehman Time to check out the Qloud! Enough Talk! Time for some Action! Finally you can have your own

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Two-Level Cooperation in Autonomic Cloud Resource Management

Two-Level Cooperation in Autonomic Cloud Resource Management Two-Level Cooperation in Autonomic Cloud Resource Management Giang Son Tran, Laurent Broto, and Daniel Hagimont ENSEEIHT University of Toulouse, Toulouse, France Email: {giang.tran, laurent.broto, daniel.hagimont}@enseeiht.fr

More information

This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 12902

This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 12902 Open Archive TOULOUSE Archive Ouverte (OATAO) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited

More information

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data

More information

Research Article Hadoop-Based Distributed Sensor Node Management System

Research Article Hadoop-Based Distributed Sensor Node Management System Distributed Networks, Article ID 61868, 7 pages http://dx.doi.org/1.1155/214/61868 Research Article Hadoop-Based Distributed Node Management System In-Yong Jung, Ki-Hyun Kim, Byong-John Han, and Chang-Sung

More information

Efficient Analysis of Cloud-based enterprise information application systems Hua Yi Lin 1, Meng-Yen Hsieh 2, Yu-Bin Chiu 1 and Jiann-Gwo Doong 1 1 Department of Information Management, China University

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

Online Failure Prediction in Cloud Datacenters

Online Failure Prediction in Cloud Datacenters Online Failure Prediction in Cloud Datacenters Yukihiro Watanabe Yasuhide Matsumoto Once failures occur in a cloud datacenter accommodating a large number of virtual resources, they tend to spread rapidly

More information

USING VIRTUAL MACHINE REPLICATION FOR DYNAMIC CONFIGURATION OF MULTI-TIER INTERNET SERVICES

USING VIRTUAL MACHINE REPLICATION FOR DYNAMIC CONFIGURATION OF MULTI-TIER INTERNET SERVICES USING VIRTUAL MACHINE REPLICATION FOR DYNAMIC CONFIGURATION OF MULTI-TIER INTERNET SERVICES Carlos Oliveira, Vinicius Petrucci, Orlando Loques Universidade Federal Fluminense Niterói, Brazil ABSTRACT In

More information

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Design of Electric Energy Acquisition System on Hadoop

Design of Electric Energy Acquisition System on Hadoop , pp.47-54 http://dx.doi.org/10.14257/ijgdc.2015.8.5.04 Design of Electric Energy Acquisition System on Hadoop Yi Wu 1 and Jianjun Zhou 2 1 School of Information Science and Technology, Heilongjiang University

More information

Dynamic resource management for energy saving in the cloud computing environment

Dynamic resource management for energy saving in the cloud computing environment Dynamic resource management for energy saving in the cloud computing environment Liang-Teh Lee, Kang-Yuan Liu, and Hui-Yang Huang Department of Computer Science and Engineering, Tatung University, Taiwan

More information

HDFS Space Consolidation

HDFS Space Consolidation HDFS Space Consolidation Aastha Mehta*,1,2, Deepti Banka*,1,2, Kartheek Muthyala*,1,2, Priya Sehgal 1, Ajay Bakre 1 *Student Authors 1 Advanced Technology Group, NetApp Inc., Bangalore, India 2 Birla Institute

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Telecom Data processing and analysis based on Hadoop

Telecom Data processing and analysis based on Hadoop COMPUTER MODELLING & NEW TECHNOLOGIES 214 18(12B) 658-664 Abstract Telecom Data processing and analysis based on Hadoop Guofan Lu, Qingnian Zhang *, Zhao Chen Wuhan University of Technology, Wuhan 4363,China

More information

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage Sam Fineberg, HP Storage SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations

More information

CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms

CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, César A. F. De Rose,

More information

Snapshots in Hadoop Distributed File System

Snapshots in Hadoop Distributed File System Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any

More information

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

More information

Windows Azure and private cloud

Windows Azure and private cloud Windows Azure and private cloud Joe Chou Senior Program Manager China Cloud Innovation Center Customer Advisory Team Microsoft Asia-Pacific Research and Development Group 1 Agenda Cloud Computing Fundamentals

More information

Elastic Load Balancing in Cloud Storage

Elastic Load Balancing in Cloud Storage Elastic Load Balancing in Cloud Storage Surabhi Jain, Deepak Sharma (Lecturer, Department of Computer Science, Lovely Professional University, Phagwara-144402) (Assistant Professor, Department of Computer

More information

Making a Smooth Transition to a Hybrid Cloud with Microsoft Cloud OS

Making a Smooth Transition to a Hybrid Cloud with Microsoft Cloud OS Making a Smooth Transition to a Hybrid Cloud with Microsoft Cloud OS Transitioning from today s highly virtualized data center environments to a true cloud environment requires solutions that let companies

More information

HRG Assessment: Stratus everrun Enterprise

HRG Assessment: Stratus everrun Enterprise HRG Assessment: Stratus everrun Enterprise Today IT executive decision makers and their technology recommenders are faced with escalating demands for more effective technology based solutions while at

More information

THE HADOOP DISTRIBUTED FILE SYSTEM

THE HADOOP DISTRIBUTED FILE SYSTEM THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,

More information

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment James Devine December 15, 2008 Abstract Mapreduce has been a very successful computational technique that has

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers

More information

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics BIG DATA WITH HADOOP EXPERIMENTATION HARRISON CARRANZA Marist College APARICIO CARRANZA NYC College of Technology CUNY ECC Conference 2016 Poughkeepsie, NY, June 12-14, 2016 Marist College AGENDA Contents

More information