Hosting Transaction Based Applications on Cloud

Size: px
Start display at page:

Download "Hosting Transaction Based Applications on Cloud"

Transcription

1 Proc. of Int. Conf. on Multimedia Processing, Communication& Info. Tech., MPCIT Hosting Transaction Based Applications on Cloud A.N.Diggikar 1, Dr. D.H.Rao 2 1 Jain College of Engineering, Belgaum, India vm4anand@gmail.com 2 Dean, Faculty of Engineering, Visvesvaraya Technological University, Belgaum, India dr.raodh@gmail.com Abstract Cloud Computing is an established and accepted paradigm of computing both in industry and academia. Primary success factors include elasticity of computing resources and flexible payment model. The challenge spectrum created by this computing model is wide. One of the challenges is to determine the type of software applications that can be hosted by the compute cloud. Traditionally applications have been broadly classified into Analytical and transactional oriented systems. The features of compute cloud are more inclined to facilitate hosting of analytical systems. In this paper we analyze the support for hosting transaction oriented applications on compute cloud. We propose a model that identifies the components required to enable transactional applications on cloud. This model presents a model with well defined components that will help designers make important design decisions. This can be considered as a pattern for hosting cloud based applications. Index Terms cloud computing, distributed transactions, data store, data distribution I. INTRODUCTION The Cloud Computing paradigm has revolutionised the way in which the Information Technology giants like Google, Amazon, and Yahoo are doing business. This revolution has been similar to electric grids that liberated the corporations from generating electricity on their own, which meant that they focussed on their business goals [1]. The data and programs are driven away from the personal computers and corporate infrastructure into the cloud. Compute cloud paradigm provides variety of resource pool that includes storage and servers, which are offered as a service. Any computer with an internet access can avail the service. A subscriber can access the service when required and unsubscribe when no longer needed, and pay only for the time for which the service was used. This on demand subscribing of computing resources without having to buy any hardware or software, along with a flexible payment model makes compute cloud a successful paradigm. To state the impact of compute cloud, the New York Times converted 4TB of data containing images of articles into sorted PDF format which was made available online, in just 24 hours for 300$ using cloud computing services [2]. The services offered by cloud can be at different levels of abstractions namely, Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). IaaS provides hardware as service that includes network, server, and storage. PaaS along with the hardware, it embeds software such as operating system and middleware software. In SaaS, cloud service providers maintain everything including the data. As we traverse from IaaS to SaaS, the risk and control shifts from users to cloud service providers as illustrated below as shown in fig.1 [3]. The highlighted parts in the figure indicate the parts that are in control of the cloud service providers. The important characteristics that motivate enterprises to host applications on compute cloud are: Elasticity: ensures that applications deployed in cloud can handle spikes in user requests for resources on DOI: 03.AETS Association of Computer Electronics and Electrical Engineers, 2013

2 Organization s IaaS PaaS SaaS dedicated IT infrastructure Data Data Data Data Applications Applications Applications + Applications Environments Servers Servers Servers Servers Storage Storage Storage Storage Network Network Network Network Control with Organization Control with Service Providers Figure 1: Shift of Control and Risk in the Compute Cloud demand. For example, in Amazon EC2 a server can be added in minutes when required. Zero maintenance: enterprises maintaining internal IT infrastructure invest time and money for running upgrade routines, service packs and to apply service packs, which is taken care by the cloud service providers [4]. This allows the enterprises to focus on the core competencies and lets the maintenance to be handled by the service providers. Reliability: compute cloud ensures higher reliability as the service providers have redundant data centres for backup and on demand access to resources to handle failures. The risk is also transferred to the provider who is better equipped to manage the security with dedicated expert staff, leading to enhanced reliability. Efficient pricing: usage based pricing allows enterprises to start small with an option to scale up with increase in requirements. This avoids huge initial capital investments to set up the infrastructure. As the service providers maintain data centres in locations that require lower cost for cooling, electricity, taxes, property value, and labour, this will ensure efficient pricing. The characteristics of compute cloud attract enterprises to deploy their applications on cloud to exploit the advantages posed by it. However it has been observed that cloud cannot support all types of applications. A study of data management tools on cloud suggests that it is difficult to deploy transaction oriented systems and is more suitable for analytical and batch processing systems. Currently cloud is more suitable for applications that require Share Nothing architecture (SN) that contain multiple nodes, with each node having its own input, output devices, memory, and disks [5]. Hence to support transaction oriented systems on cloud; it requires the distributed database to implement SN architecture. We can consider two options of data management techniques for deploying distributed database on cloud. One is to use the traditional relational databases and another is to implement distributed database using available cloud data management services. The relational databases from Microsoft SQL, Sybase, IBM, and Oracle either do not implement SN architecture or suitable for data warehousing systems. Prominent cloud data management services include Google s Bigtable [6], Amazon s SimpleDB [7], and PNUTS from Yahoo [8]. The design choices of the cloud data management tools include SN architecture but these are developed for their internal systems with availability of systems as higher priority than the data consistency. The transaction oriented systems have strong data consistency as priority with support for atomicity, consistency, isolation, and durability (ACID) properties. However none of the cloud data management tools consider support for ACID properties. Hence the study on data management techniques concludes that neither the traditional databases nor the cloud data management services are suitable for supporting transaction oriented systems [1]. In this paper we propose a model that identifies the components required to support transactions on compute cloud. We provide the important design choices for these components that will help enterprises in designing a transaction oriented system on cloud. The design choices for the components ensure ACID properties in the cloud and also ensuring that the system can leverage the advantages of compute cloud. The decision choices for the components are a combination of established principles of traditional database systems and the distributed data management principles to deal with the drawbacks of cloud data management services. II. RELATED STUDY With the advent of cloud computing, many data management services on cloud also evolved to leverage the cloud characteristics. Some important services include Google s Bigtable, Amazon SimpleDB, and Yahoo s Pnuts. Bigtable data model uses a multidimensional sorted map which is indexed by row and column keys along with a timestamp as {row: string, column: string, timestamp}. Read and write operations are based on row keys which are sorted. Tables are horizontally and dynamically partitioned into tablets, which store an 152

3 ordered range of row keys. Simple operations such as insert, delete, and update are possible on single row using APIs. The APIs support only single row transactions which means multiple row update operations is not possible. Hence it cannot provide atomic distributed transactions. As Bigtable was created for internal applications at Google, it is ideal for massive parallel computations like MapReduce and not for distributed transactions. It uses distributed lock service called Chubby to control data replication and to coordinate different components of Bigtable [9]. It uses Paxos algorithm for ensuring consistency among different replicas [10]. The chubby lock service is the critical part of the system which has the control and coordination information of system and its failure will bring down the entire system [6]. Amazon s SimpleDB is designed to support other Amazon services such as EC2 (server on demand) and S3 (on demand storage) which form the Amazon cloud infrastructure [7]. Tables of data are composed as domains that allow a column to hold multiple values. It provides APIs for operations on data in domains. All values are stored as string which is the only data type supported. Replicas are managed by different clusters. It supports asynchronous updating of replicas. The eventual consistency model is not suitable for enabling transactions on cloud. SimpleDB runs on top of Amazon s infrastructure which means that database will not be portable to other database service provider on cloud [4]. Yahoo s PNUTS [8] was created to aid its internal applications. Database is horizontally partitioned, with each partition replicated in different regions. It uses Yahoo Message Broker to enable asynchronous replication management that keeps all the replicas consistent. The record level mastering scheme is used for consistency model, which means that only one region at a time will have the master record to serve data requests. It allows the migrating the record mastership based on request loads. It has three components namely Storage units, router, and controller. Storage unit component stores data and serves data requests. Router has configuration details of record distribution across the different regions. B + tree data structure is used to store mapping information of record to region based on the primary key. The controller is used for recovery during failure of storage units, and it enables load balancing between different regions. To summarise, most of the cloud data service providers have opted for a data model that enable horizontal partitioning with data stored as key value pairs and enable access based on row keys. The design choices prioritise availability, scalability, and replication but have relaxed strict constraints of relational model. All have opted for eventual consistency model and the APIs provide data access based on a single row key and cannot support transactions that involve more than one row key. Das, et al. [11], have proposed a scalable system that supports transactions on cloud. In this paper we propose a model that illustrates fundamental blocks of a system that require transaction support on cloud. The following aspects highlight the important challenges that are to be addressed. Data structure, partitioning and distribution of application data, managing data request, and coordination among the components for load balancing and recovery mechanism. III. SYSTEM DESIGN A. Data model Cloud based applications demand a simple data representation for storing and accessing through a programming interface. The data model must enable simple partitioning of application data. We opt for a data model that is similar to the Google s Bigtable with application data stored as key value pairs. The reason behind the choice of this data structure is that we must group the data in the partitions such that transactions are confined to single region. This is possible because most of the applications have good spatial and temporal locality if there is an effective data organisation scheme [12]. B. Dynamics Application data requests are directed to the TP system, which consists of many Transaction and Data Managers (TDM) and Central Coordination service (CCS) to enable distributed transactions. Each TDM contain a partition of database. The application data is persisted on the cloud storage for backup and recovery. CCS will coordinate and monitor all the components of the system. It organizes the meta data information related to application tables, and monitors and coordinates mapping of partitions to TDMs. All the components of the model are set up in the compute cloud. C. Persistent Data Store (PDS) The responsibility of the PDS is to store the application data and logs persistently. There are many alternatives for distributed file storage, some of them are Google s GFS [13], Apache Hadoop HDFS [14], 153

4 and Amazon S3. One of the advantages of these cloud storages is that they are designed to handle replication and recovery from failures. Here though there are various backups of the system data across different regions, the model will have only one active region. D. Transaction and Data Manager (TDM) Figure 2: Overview of model. The database is partitioned across TDMs which handles the operations on the data partition it stores. TDM is a combination of data store and the transaction manager. Transaction manager is used on top of a data store such as Bigtable or HBase [15], to execute transactions on the partition allocated to it. TDM is also responsible for transaction logging and concurrency control during execution of transactions. To reduce the latency of distributed transactions during commitment protocol, effective logging technique such as force logging or neighbor main memory logging can be used [16]. The TDMs stores the mapping of partition to TDMs from CCS to identify the TDMs responsible to handle the transaction requests. E. Central Coordination Service (CCS) This component is used to maintain the configuration and metadata information required to coordinate all the components of the model. The critical mapping and metadata information is referred to as system state, which will keep all the others components of the system in consistent state. CCS ensures that all the TDMs have updated copy of the mapping information. It carries out regular maintenance operations to detect failure of components and initiates recovery in event of failure. In the event of failure of a TDM, it creates a new TDM and allocates the partition of failed TDM to it and informs other TDMs. An effective data distribution will reduce the latency of distributed transactions and to ensure this, Improved Range data distribution and an online migrating algorithm can be used [17]. The chubby lock service can be part of CCS but it uses Paxos consensus algorithm which needs 2F+1 servers to manage failure of F servers [18]. The main advantage of CCS is that it is not involved in transaction execution and the transaction requests are forwarded directly to the TDMs and hence it is decoupled from operations of TDM and can efficiently manage the system. A cluster management system is a base component that helps set up a cluster of servers on which other components are built. The CCS builds on the services of the cluster management system to communicate and monitor the TDMs in the cluster. F. Transaction Management In this section we discuss how to ensure ACID transactions on the cloud. Atomicity of transactions is determined by the commitment protocol. The optimized two phase commit protocol such as in ClustRa Telecom Database [16], are suitable for distributed transactions. The TDMs can be used to store log statements during transactions to reduce the latency of commitment protocol. These backup TDMs store logs for speeding the transactions and also help during recovery in case a TDM fails during transaction execution. Since the TDMs are in cloud the communication between nearby backup TDMs can ensure atomicity using transaction logs. The consistency of transaction that ensures referential integrity constraints can be handled by the application logic; hence this is not part of our model. Isolation property ensures that updates are affected by transactions operating on same data simultaneously. In the design of our model we adopt timestamp ordering for concurrency control where any transaction with older timestamp gets the priority 154

5 [19]. Durability is ensured in our model by writing to the persistent data storage after completion of the update transactions. During failure of TDMs, durability is assured by using the transaction logs in the back up TDMs during the recovery. Recovery component is part of our model that ensures recovery of application data on failure of TDMs. The recovery in our model is based on logging at backup TDMs. The CCS detects the failure of TDMs, and will use the configuration information and services of cluster management system to set up a new TDM. The CCS initiates recovery if data at the failed TDM by using the logs and application data from the persistent data storage. Finally it updates the mapping information to all the TDM of the new TDM. The logging component is embedded in the TDMs as it is part of commitment protocol during transactions. Hence our model consists of following components: data model, persistent data storage, TDM, CCS, logging, commitment protocol, recovery, and cluster management system. Regarding the implementation direction we plan to use the open source framework from Hadoop that will facilitate cluster management and persistent data storage using the distributed database of Hbase which is similar to Bigtable. The CCS and logging can be managed by Zookeeper which is part of Hadoop project. We are in the process of designing a transaction based application which will be prepared as a case study to demonstrate the design and development of a cloud based application based on the components illustrated in this paper. IV. CONCLUSIONS The proposed model consists of components that together ensure distributed transaction on compute cloud. We have identified the set of components that are part of any system that need to support execution of distributed transactions on cloud. We have briefly highlighted implementation directions which indicate the feasibility of our model. We have highlighted some design choices for most of the components. This paper will help in design of a transaction oriented system to be hosted on the compute cloud. REFERENCES [1] D. Abadi, Data Management in the Cloud: Limitations and Opportunities, Bulletin of IEEE Computer Society Technical Committee on Data Engineering, vol. 32, no. 1, pp. 3-12, [2] D. Blum, Security and Risk Management, BurtonGroup, [Online]. Available: [Accessed 2013]. [3] D. GOTTFRID, The New York Times, The New York Times, [Online]. Available: [Accessed 2013]. [4] G. Reese, Cloud Application Architectures, Sebastopol: O Reilly Media, Inc., [5] M. Mehta and D. DeWitt, Data placement in shared-nothing parallel database systems, The VLDB Journal, pp , [6] F. Chang, J. Dean, S. Ghemawat, C. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes and E. Gruber, Bigtable: A distributed storage system for structured data, ACM Transactions on Computer Systems (TOCS), vol. 26, no. 2, [7] Amazon SimpleDB, Amazon SimpleDB, [Online]. Available: [Accessed August 2009]. [8] B. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H. Jacobsen, N. Puz, D. Weaver and R. Yerneni, Pnuts: Yahoo!s hosted data serving platform., in Proceedings of the VLDB Endowment, [9] M. Burrows, The Chubby lock service for loosely coupled distributed systems, in Proceedings of the 7th symposium on Operating systems design and implementation, Seattle, [10] D. T. Chandra, R. Griesemer and J. Redstone, Paxos made live: an engineering perspective, in Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing, Portland, Oregon, [11] S. Das, D. Agrawal and A. E. Abbadi, ElasTraS: An Elastic Transactional Data Store in the Cloud, in USENIX HotClouds Workshop, 2009 [12] G. Urdaneta, G. Pierre and M. Steen, Wikipedia workload analysis for decentralized hosting, Computer Networks: The International Journal of Computer and Telecommunications Networking, vol. 53, no. 11, pp , [13] S. Ghemawat, H. Gobioff and S.-T. Leung, The Google file system, ACM SIGOPS Operating Systems Review, vol. 37, no. 5, pp , 2003 [14] D. Borthakur, The Apache Software Foundation., [Online]. Available:. docs/r1.2.1/ hdfs_design. html [Accessed 2013]. [15] Apache HBase, Apache HBase, October [Online]. Available: [Accessed 2013]. [16] S.-O. Hvasshovd, O. Torbjornsen, S. E. Bratsberg and P. Holager, The ClustRa Telecom Database: High Availability, High Throughput, and Real-Time Response, in Proceedings of the 21th International Conference on Very Large Data Bases, San Francisco, CA, USA,

6 [17] W. Gong, L. Yang, D. Huang and L. Chen, New Balanced Data Allocating and Online Migrating Algorithms in Database Cluster, in Advances in Data and Web Management, Berlin, Springer Berlin / Heidelberg, 2009, pp [18] Z. Wei, G. Pierre and C.-H. Chi, Scalable Transactions for Web Applications in the Cloud, in Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 2009, pp [19] P. A. Bernstein and N. Goodman, Concurrency Control in Distributed Database Systems, ACM Computing Surveys (CSUR), vol. 13, no. 2, pp ,

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-

More information

Cloud Data Management: A Short Overview and Comparison of Current Approaches

Cloud Data Management: A Short Overview and Comparison of Current Approaches Cloud Data Management: A Short Overview and Comparison of Current Approaches Siba Mohammad Otto-von-Guericke University Magdeburg siba.mohammad@iti.unimagdeburg.de Sebastian Breß Otto-von-Guericke University

More information

Hypertable Architecture Overview

Hypertable Architecture Overview WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for

More information

ElasTraS: An Elastic Transactional Data Store in the Cloud

ElasTraS: An Elastic Transactional Data Store in the Cloud ElasTraS: An Elastic Transactional Data Store in the Cloud Sudipto Das Divyakant Agrawal Amr El Abbadi Department of Computer Science, UC Santa Barbara, CA, USA {sudipto, agrawal, amr}@cs.ucsb.edu Abstract

More information

MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM

MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM Julia Myint 1 and Thinn Thu Naing 2 1 University of Computer Studies, Yangon, Myanmar juliamyint@gmail.com 2 University of Computer

More information

Cloud Database Emergence

Cloud Database Emergence Abstract RDBMS technology is favorable in software based organizations for more than three decades. The corporate organizations had been transformed over the years with respect to adoption of information

More information

Snapshots in Hadoop Distributed File System

Snapshots in Hadoop Distributed File System Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

BIG DATA WEB ORGINATED TECHNOLOGY MEETS TELEVISION BHAVAN GHANDI, ADVANCED RESEARCH ENGINEER SANJEEV MISHRA, DISTINGUISHED ADVANCED RESEARCH ENGINEER

BIG DATA WEB ORGINATED TECHNOLOGY MEETS TELEVISION BHAVAN GHANDI, ADVANCED RESEARCH ENGINEER SANJEEV MISHRA, DISTINGUISHED ADVANCED RESEARCH ENGINEER BIG DATA WEB ORGINATED TECHNOLOGY MEETS TELEVISION BHAVAN GHANDI, ADVANCED RESEARCH ENGINEER SANJEEV MISHRA, DISTINGUISHED ADVANCED RESEARCH ENGINEER TABLE OF CONTENTS INTRODUCTION WHAT IS BIG DATA?...

More information

Communication System Design Projects

Communication System Design Projects Communication System Design Projects PROFESSOR DEJAN KOSTIC PRESENTER: KIRILL BOGDANOV KTH-DB Geo Distributed Key Value Store DESIGN AND DEVELOP GEO DISTRIBUTED KEY VALUE STORE. DEPLOY AND TEST IT ON A

More information

TOWARDS TRANSACTIONAL DATA MANAGEMENT OVER THE CLOUD

TOWARDS TRANSACTIONAL DATA MANAGEMENT OVER THE CLOUD TOWARDS TRANSACTIONAL DATA MANAGEMENT OVER THE CLOUD Rohan G. Tiwari Database Research Group, College of Computing Georgia Institute of Technology Atlanta, USA rtiwari6@gatech.edu Shamkant B. Navathe Database

More information

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Introduction to Hadoop

Introduction to Hadoop Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction

More information

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf

More information

From a Virtualized Computing Nucleus to a Cloud Computing Universe: A Case for Dynamic Clouds

From a Virtualized Computing Nucleus to a Cloud Computing Universe: A Case for Dynamic Clouds From a Virtualized Computing Nucleus to a Cloud Computing Universe: A Case for Dynamic Clouds Divyakant Agrawal Sudipto Das Amr El Abbadi Department of Computer Science University of California, Santa

More information

Amr El Abbadi. Computer Science, UC Santa Barbara amr@cs.ucsb.edu

Amr El Abbadi. Computer Science, UC Santa Barbara amr@cs.ucsb.edu Amr El Abbadi Computer Science, UC Santa Barbara amr@cs.ucsb.edu Collaborators: Divy Agrawal, Sudipto Das, Aaron Elmore, Hatem Mahmoud, Faisal Nawab, and Stacy Patterson. Client Site Client Site Client

More information

Cloud computing - Architecting in the cloud

Cloud computing - Architecting in the cloud Cloud computing - Architecting in the cloud anna.ruokonen@tut.fi 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices

More information

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis , 22-24 October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying

More information

Data Management in the Cloud -

Data Management in the Cloud - Data Management in the Cloud - current issues and research directions Patrick Valduriez Esther Pacitti DNAC Congress, Paris, nov. 2010 http://www.med-hoc-net-2010.org SOPHIA ANTIPOLIS - MÉDITERRANÉE Is

More information

Data Management Challenges in Cloud Computing Infrastructures

Data Management Challenges in Cloud Computing Infrastructures Data Management Challenges in Cloud Computing Infrastructures Divyakant Agrawal Amr El Abbadi Shyam Antony Sudipto Das University of California, Santa Barbara {agrawal, amr, shyam, sudipto}@cs.ucsb.edu

More information

Scalable Transaction Management on Cloud Data Management Systems

Scalable Transaction Management on Cloud Data Management Systems IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 10, Issue 5 (Mar. - Apr. 2013), PP 65-74 Scalable Transaction Management on Cloud Data Management Systems 1 Salve

More information

Cloud Computing Training

Cloud Computing Training Cloud Computing Training TechAge Labs Pvt. Ltd. Address : C-46, GF, Sector 2, Noida Phone 1 : 0120-4540894 Phone 2 : 0120-6495333 TechAge Labs 2014 version 1.0 Cloud Computing Training Cloud Computing

More information

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara

Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara Sudipto Das (Microsoft summer intern) Shyam Antony (Microsoft now) Aaron Elmore (Amazon summer intern)

More information

What is Analytic Infrastructure and Why Should You Care?

What is Analytic Infrastructure and Why Should You Care? What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group grossman@uic.edu ABSTRACT We define analytic infrastructure to be the services,

More information

Open Access Research of Massive Spatiotemporal Data Mining Technology Based on Cloud Computing

Open Access Research of Massive Spatiotemporal Data Mining Technology Based on Cloud Computing Send Orders for Reprints to reprints@benthamscience.ae 2244 The Open Automation and Control Systems Journal, 2015, 7, 2244-2252 Open Access Research of Massive Spatiotemporal Data Mining Technology Based

More information

Loose Coupling between Cloud Computing Applications and Databases: A Challenge to be Hit

Loose Coupling between Cloud Computing Applications and Databases: A Challenge to be Hit International Journal of Computer Systems (ISSN: 2394-1065), Volume 2 Issue 3, March, 2015 Available at http://www.ijcsonline.com/ Loose Coupling between Cloud Computing Applications and Databases: A Challenge

More information

Cloud data store services and NoSQL databases. Ricardo Vilaça Universidade do Minho Portugal

Cloud data store services and NoSQL databases. Ricardo Vilaça Universidade do Minho Portugal Cloud data store services and NoSQL databases Ricardo Vilaça Universidade do Minho Portugal Context Introduction Traditional RDBMS were not designed for massive scale. Storage of digital data has reached

More information

Distributed Systems. Tutorial 12 Cassandra

Distributed Systems. Tutorial 12 Cassandra Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester, 2013-2014 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse

More information

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES Constantin Brâncuşi University of Târgu Jiu ENGINEERING FACULTY SCIENTIFIC CONFERENCE 13 th edition with international participation November 07-08, 2008 Târgu Jiu TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED

More information

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,

More information

Storage of Structured Data: BigTable and HBase. New Trends In Distributed Systems MSc Software and Systems

Storage of Structured Data: BigTable and HBase. New Trends In Distributed Systems MSc Software and Systems Storage of Structured Data: BigTable and HBase 1 HBase and BigTable HBase is Hadoop's counterpart of Google's BigTable BigTable meets the need for a highly scalable storage system for structured data Provides

More information

Data Management in the Cloud

Data Management in the Cloud Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server

More information

Big Data Web Originated Technology meets Television Bhavan Gandhi and Sanjeev Mishra ARRIS

Big Data Web Originated Technology meets Television Bhavan Gandhi and Sanjeev Mishra ARRIS Big Data Web Originated Technology meets Television Bhavan Gandhi and Sanjeev Mishra ARRIS Abstract The development of Big Data technologies was a result of the need to harness the ever-growing zettabytes

More information

A programming model in Cloud: MapReduce

A programming model in Cloud: MapReduce A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Introduction to Hadoop

Introduction to Hadoop 1 What is Hadoop? Introduction to Hadoop We are living in an era where large volumes of data are available and the problem is to extract meaning from the data avalanche. The goal of the software tools

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services

More information

A Study on Service Oriented Network Virtualization convergence of Cloud Computing

A Study on Service Oriented Network Virtualization convergence of Cloud Computing A Study on Service Oriented Network Virtualization convergence of Cloud Computing 1 Kajjam Vinay Kumar, 2 SANTHOSH BODDUPALLI 1 Scholar(M.Tech),Department of Computer Science Engineering, Brilliant Institute

More information

Slave. Master. Research Scholar, Bharathiar University

Slave. Master. Research Scholar, Bharathiar University Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper online at: www.ijarcsse.com Study on Basically, and Eventually

More information

How To Understand Cloud Computing

How To Understand Cloud Computing Overview of Cloud Computing (ENCS 691K Chapter 1) Roch Glitho, PhD Associate Professor and Canada Research Chair My URL - http://users.encs.concordia.ca/~glitho/ Overview of Cloud Computing Towards a definition

More information

Cloud Computing and the OWL Language

Cloud Computing and the OWL Language www.ijcsi.org 233 Search content via Cloud Storage System HAYTHAM AL FEEL 1, MOHAMED KHAFAGY 2 1 Information system Department, Fayoum University Egypt 2 Computer Science Department, Fayoum University

More information

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research Introduction to Cloud : Cloud and Cloud Storage Lecture 2 Dr. Dalit Naor IBM Haifa Research Storage Systems 1 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

From Grid Computing to Cloud Computing & Security Issues in Cloud Computing

From Grid Computing to Cloud Computing & Security Issues in Cloud Computing From Grid Computing to Cloud Computing & Security Issues in Cloud Computing Rajendra Kumar Dwivedi Assistant Professor (Department of CSE), M.M.M. Engineering College, Gorakhpur (UP), India E-mail: rajendra_bhilai@yahoo.com

More information

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk. Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Data Migration from Grid to Cloud Computing

Data Migration from Grid to Cloud Computing Appl. Math. Inf. Sci. 7, No. 1, 399-406 (2013) 399 Applied Mathematics & Information Sciences An International Journal Data Migration from Grid to Cloud Computing Wei Chen 1, Kuo-Cheng Yin 1, Don-Lin Yang

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

Distributed Lucene : A distributed free text index for Hadoop

Distributed Lucene : A distributed free text index for Hadoop Distributed Lucene : A distributed free text index for Hadoop Mark H. Butler and James Rutherford HP Laboratories HPL-2008-64 Keyword(s): distributed, high availability, free text, parallel, search Abstract:

More information

Implementation Issues of A Cloud Computing Platform

Implementation Issues of A Cloud Computing Platform Implementation Issues of A Cloud Computing Platform Bo Peng, Bin Cui and Xiaoming Li Department of Computer Science and Technology, Peking University {pb,bin.cui,lxm}@pku.edu.cn Abstract Cloud computing

More information

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise Cloud Service Model Selecting a cloud service model Different cloud service models within the enterprise Single cloud provider AWS for IaaS Azure for PaaS Force fit all solutions into the cloud service

More information

Cloud computing doesn t yet have a

Cloud computing doesn t yet have a The Case for Cloud Computing Robert L. Grossman University of Illinois at Chicago and Open Data Group To understand clouds and cloud computing, we must first understand the two different types of clouds.

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul

More information

Cassandra A Decentralized, Structured Storage System

Cassandra A Decentralized, Structured Storage System Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922

More information

Data Management in Cloud based Environment using k- Median Clustering Technique

Data Management in Cloud based Environment using k- Median Clustering Technique Data Management in Cloud based Environment using k- Median Clustering Technique Kashish Ara Shakil Department of Computer Science Jamia Millia Islamia New Delhi, India Mansaf Alam Department of Computer

More information

Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece mattos@csd.uoc.

Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece mattos@csd.uoc. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece mattos@csd.uoc.gr Joining Cassandra Binjiang Tao Computer Science Department University of Crete Heraklion,

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

How To Build Cloud Storage On Google.Com

How To Build Cloud Storage On Google.Com Building Scalable Cloud Storage Alex Kesselman alx@google.com Agenda Desired System Characteristics Scalability Challenges Google Cloud Storage What does a customer want from a cloud service? Reliability

More information

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Big Data Storage Architecture Design in Cloud Computing

Big Data Storage Architecture Design in Cloud Computing Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,

More information

Cloud Storage Solution for WSN Based on Internet Innovation Union

Cloud Storage Solution for WSN Based on Internet Innovation Union Cloud Storage Solution for WSN Based on Internet Innovation Union Tongrang Fan 1, Xuan Zhang 1, Feng Gao 1 1 School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang,

More information

bigdata Managing Scale in Ontological Systems

bigdata Managing Scale in Ontological Systems Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural

More information

Trafodion Operational SQL-on-Hadoop

Trafodion Operational SQL-on-Hadoop Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL

More information

A Study on Multi-Attribute Database Indexing on Cloud System 1

A Study on Multi-Attribute Database Indexing on Cloud System 1 A Study on Multi-Attribute Database Indexing on Cloud System 1 Yu-Lung Lo and Choon-Yong Tan 2 Abstract Recent years, the Cloud computing technologies have become more and more important for many existing

More information

Elasticity in Multitenant Databases Through Virtual Tenants

Elasticity in Multitenant Databases Through Virtual Tenants Elasticity in Multitenant Databases Through Virtual Tenants 1 Monika Jain, 2 Iti Sharma Career Point University, Kota, Rajasthan, India 1 jainmonica1989@gmail.com, 2 itisharma.uce@gmail.com Abstract -

More information

Cloud Computing Architecture: A Survey

Cloud Computing Architecture: A Survey Cloud Computing Architecture: A Survey Abstract Now a day s Cloud computing is a complex and very rapidly evolving and emerging area that affects IT infrastructure, network services, data management and

More information

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2 DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing Slide 1 Slide 3 A style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

Introduction to Apache Cassandra

Introduction to Apache Cassandra Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating

More information

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF Non-Stop for Apache HBase: -active region server clusters TECHNICAL BRIEF Technical Brief: -active region server clusters -active region server clusters HBase is a non-relational database that provides

More information

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Jeffrey D. Ullman slides. MapReduce for data intensive computing Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Cloud Storage Solution for WSN in Internet Innovation Union

Cloud Storage Solution for WSN in Internet Innovation Union Cloud Storage Solution for WSN in Internet Innovation Union Tongrang Fan, Xuan Zhang and Feng Gao School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang, 050043, China

More information

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline References Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service YongChul Kwon Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website Part of

More information

Introduction to Database Systems CSE 444

Introduction to Database Systems CSE 444 Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service YongChul Kwon References Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website Part of

More information

ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A COMPREHENSIVE VIEW OF HADOOP ER. AMRINDER KAUR Assistant Professor, Department

More information

A Distribution Management System for Relational Databases in Cloud Environments

A Distribution Management System for Relational Databases in Cloud Environments JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 11, NO. 2, JUNE 2013 169 A Distribution Management System for Relational Databases in Cloud Environments Sze-Yao Li, Chun-Ming Chang, Yuan-Yu Tsai, Seth

More information

Cloud Courses Description

Cloud Courses Description Cloud Courses Description Cloud 101: Fundamental Cloud Computing and Architecture Cloud Computing Concepts and Models. Fundamental Cloud Architecture. Virtualization Basics. Cloud platforms: IaaS, PaaS,

More information

From Grid Computing to Cloud Computing & Security Issues in Cloud Computing

From Grid Computing to Cloud Computing & Security Issues in Cloud Computing From Grid Computing to Cloud Computing & Security Issues in Cloud Computing Rajendra Kumar Dwivedi Department of CSE, M.M.M. Engineering College, Gorakhpur (UP), India 273010 rajendra_bhilai@yahoo.com

More information

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,maheshkmaurya@yahoo.co.in

More information

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759

More information

Storage Architectures for Big Data in the Cloud

Storage Architectures for Big Data in the Cloud Storage Architectures for Big Data in the Cloud Sam Fineberg HP Storage CT Office/ May 2013 Overview Introduction What is big data? Big Data I/O Hadoop/HDFS SAN Distributed FS Cloud Summary Research Areas

More information

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of

More information

Future Prospects of Scalable Cloud Computing

Future Prospects of Scalable Cloud Computing Future Prospects of Scalable Cloud Computing Keijo Heljanko Department of Information and Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 7.3-2012 1/17 Future Cloud Topics Beyond

More information

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Brian Ye, Anders Ye School of Computer Science and Communication (CSC), Royal Institute of Technology KTH, Stockholm, Sweden Abstract.

More information

Apache Hadoop FileSystem and its Usage in Facebook

Apache Hadoop FileSystem and its Usage in Facebook Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs

More information

Comparative analysis of Google File System and Hadoop Distributed File System

Comparative analysis of Google File System and Hadoop Distributed File System Comparative analysis of Google File System and Hadoop Distributed File System R.Vijayakumari, R.Kirankumar, K.Gangadhara Rao Dept. of Computer Science, Krishna University, Machilipatnam, India, vijayakumari28@gmail.com

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra

More information