Cloud Computing for Big Data

Size: px
Start display at page:

Download "Cloud Computing for Big Data"

Transcription

1 Cloud Computing for Big Data Hanan Elazhary Department of Computing and Information Technology King Abdulaziz University, Jeddah, Saudi Arabia Abstract: Big Data is characterized by large data sets and compute-intensive applications. Examples include computational biology applications such as genome or DNA sequencing, proteomics, computational neuroscience, computational pharmacology and metagenomics. Physics, business, and government also have many applications. Such data and corresponding applications present a challenge to traditional storage and computing solutions. This is in addition to the problem of sharing such a large amount of data among researchers in a controlled fashion. Cloud computing is a promising solution that offers unlimited on-demand elastic storage and compute capacity at an affordable cost. The purpose of this paper is to discuss opportunities and challenges of using cloud computing for processing Big Data. Additionally, it provides a comprehensive survey of existing tools for Big Data and classifies them using a criterion specific for Big Data. Example applications utilizing these tools are also provided. Keywords: Big Data, Computational Biology, Cloud Computing 1. Introduction In recent years, there has been an increasing interest in Big Data applications. For example, computational biology [1] aims at gaining larger insight into biology. Computational biology applications include the Human Genome Project (HGP) [2] that aims at the complete understanding of the human genome 1. Possible techniques for such a project involve DNA sequencing or full genome sequencing [3] whose goal is determining the full DNA sequence of a given genome at a single time. Another application is proteomics [4], which aims at the complete understanding of proteomes 2. Computational neuroscience [5] refers to the study of the structure of the nervous system of the brain and its information-processing functions. The Mouse Brain Atlas [6, 7] and the Human Brain Atlas [8] are example projects carried out by the Allen Institute for 1 The term "genome" refers to the whole set of genes of a given organism. 2 The term "proteome" is a mix of the terms "protein" and "genome" and refers to the whole set of proteins of a given organism. Brain Science. Metagenomics [9] is a field that aims at studying the genetic material obtained from environmental samples. Metagenomics data is both huge and noisy as it contains fragmented data that can represent about 10,000 species. Computational pharmacology [1] is another field concerned with finding a linkage between genes and diseases in order to identify potential drugs. Physics has its applications too. For example, the European Organization for Nuclear Research (CERN) built the largest and most powerful particle collider, the Large Hadron Collider (LHC) [10] aiming at allowing physicists to test the predictions of different theories of particle physics and high-energy physics. Data produced by LHC and LHC-related simulations has been estimated to be approximately fifteen petabytes per year. The NASA Center for Climate Simulation (NCCS) [11] processes as much as 32 petabytes of climate observations and simulations [12]. The Sloan Digital Sky Survey (SDSS) [13] uses a dedicated optical telescope for sky survey. Data collection

2 began in 2000 and the images collected so far cover over 35% of the sky. Amazon [14], e-bay [15], Walmart [16] and Facebook [17] are examples of business applications of Big Data. Governmental applications of Big Data include the analysis of cargo traffic from entry ports up to exit ports to ensure security of the global supply chain [18]. Omaba's campaign for example used Big Data to rally individual voters during elections in 2012 [19]. Big Data applications imply both the storage and compute-intensive analyses and processing of tremendous amounts of data. In the best case, most analyses are O(N) and this gets worse in case pairwise or higherorder associations are examined [20]. Unfortunately, traditional storage and computing solutions are inadequate for satisfying the requirements of such data and applications. Another problem is the need to share such data among researchers at different locations in a restricted and controlled fashion. This is in addition to the bandwidth required for the transfer of the data. Cloud computing offers promising solutions to most of these problems and so the goal of this paper is to provide definitions for cloud computing and to highlight opportunities and challenges in using cloud computing for Big Data. A comprehensive survey of Big Data tools is provided and the tools are classified using a criterion suitable for Big Data. The paper also provides example Big Data applications utilizing the cloud. The paper is organized as follows: Section 2 provides definitions of cloud computing. Sections 3 and 4 discuss opportunities and challenges of cloud computing for Big Data respectively. Section 5 discusses and classifies existing tools for Big Data and example applications using these tools. Finally, Section 6 provides the conclusions. 2. Cloud Computing Definition So far, there is no agreement in the literature about the definition of cloud computing. To the best of our knowledge, the only formal definition in the literature has been published after years of work and 15 drafts by the National Institute of Standards and Technology (NIST) in September 2011 [21]. According to NIST [22], cloud computing is a model with five essential characteristics, three service models, and four deployment models. The five essential characteristics are: Network access: Resources are available over the network and accessed through standard mechanisms using different types of clients such as mobile phones, tablets, laptops, PCs, and workstations. Convenient resource access: A consumer can self-configure resources on-demand as needed with minimal interaction with the service provider. Resource pooling: The resources are pooled to appear unlimited and serve multiple consumers; this is achieved by dynamically assigning and reassigning resources according to demand. Rapid elastic provisioning of resources with minimal management effort: Resources can be elastically provisioned to scale rapidly outward and inward with demand. Metered service: Provided services are metered on a pay-per-use basis at some level of abstraction according to the type of service. The three service models are: Infrastructure as a Service (IaaS): The consumer can be provided computing resources (such as processors, storage, and networks) to deploy and run arbitrary software including operating systems and applications with limited computing resources configuration. Platform as a Service (PaaS): The consumer can deploy and run applications created using programming languages, libraries, services, and tools supported by the provider also with limited applicationhosting environment configuration and no

3 configuration of the underlying infrastructure. Software as a Service (SaaS): The consumer can use applications provided by the provider and running on a cloud infrastructure with limited consumerspecific application configuration. The four deployment models are: Private cloud: The cloud infrastructure is intended to be used exclusively by a single organization with multiple consumers. Community cloud: The cloud infrastructure is intended to be used exclusively by a specific community of consumers belonging to different organizations but with common concerns and interests. Public cloud: The cloud infrastructure is intended to be used by the general public. Hybrid cloud: The cloud infrastructure is formed of distinct cloud infrastructures (private, community, or public) that are linked together using standards that enable and facilitate portability as needed. The problem with this definition is that it is over-specified. This makes the definition both overwhelming (due to using too many terms) and un-extendable (due to being very specific). Accordingly, in spite of the effort exerted to formulate this definition, it has been criticized several times in the literature. According to Daconta [23], the definition is "incomplete, distorted and shortsighted" for many reasons. For example, it limited itself to three out of several possible "things as a service." Besides, it assumes that the three service models (IaaS, PaaS and SaaS) are layered, which is not always true. It also assumes that the three models are equally important, which is also considerably false. Chou [24] mentioned that "the classification and some definitions of the four deployment models are redundant and inconsistent." For example, a community cloud is in fact a private cloud but for a specific community. He also criticized the change of criteria of classification: a hybrid cloud is formed of different clouds, but a private cloud and a public cloud are classified according to their consumers. We redefine cloud computing as a computational model that provides metered convenient access to shared services. The five terms employed in the above definition can be discussed as follows: The term "model" is a general term that can describe different possible implementations and deployments; this implies that the deployment models of NIST (IaaS, PaaS and SaaS) should not be included as a part of the definition just as Personal Area Network (PAN), Local Area Network (LAN), Metropolitan Area Network (MAN), and Wide Area Network (WAN) are not included as a part of the definition of computer networks. The term "services" is another general term that covers any type of service including physical and virtual services, hardware resources, software solutions, Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). The term "shared" implies pooled hosted networked ubiquitous services within the range of the cloud. The term "metered" implies pay-per-use service for the benefit of both the consumers and the service providers. The term "convenient" is an extendable term that incorporates as many features as needed such as on-demand rapid (possibly self-) configuration and access to services matching the consumer s needs using different types of clients with minimal service provider interaction. It also incorporates pushing risk out of the business (from the point of view of the consumer) and elastic provisioning with

4 minimal management effort (from the point of view of the service provider). 3. Opportunities of Cloud Computing Cloud computing offers tremendous opportunities for Big Data. It has several promising capabilities; for example: Scalability: In cloud computing, capacity is virtually unlimited and so scalability is always possible; instead of running a job on a single computer for 10 hours, it can be run on 10 computers for a single hour. Elasticity: Resources are provisioned and de-provisioned dynamically according to workload changes. Elasticity has three dimensions: cost, quality, and resources [25]. Pay-per-use Capability: Since resources are dynamically provisioned according to the workload changes, payment is made according to the actual utilization so as not to waste money Sharing: Cloud computing allows the transparent sharing of resources. For example, cloud data stores allow sharing of large datasets instead of cashing copies on different individual clusters. Data Reliability: Copies of data can be backed up in different geographical locations to overcome data loss even due to natural disasters.. Big Data Paradigms: A set of paradigms such as MapReduce [26, 27] and Dremel [28] have been developed specifically for processing and analyzing Big Data. Easier maintenance and upgrade: Maintenance is done by the service provider allowing researchers to concentrate merely on research. 4. Challenges of Cloud Computing In spite of the many opportunities offered by cloud computing for Big Data, there are many challenges that need to be addressed; these include: Security Issues: reduced control over the location of sensitive data and possibility of data leakage since data belonging to different costumers can be stored in the same location. There is also the problem of privacy of human data on the cloud. Internet connection: In case of an application with heavy communication, a stable Internet connection with a high bandwidth is required and is not always available. This is in addition to the time and cost required to transfer large datasets to the cloud or between clouds. Big Data computational paradigms and tools such as MapReduce do not perform well and cause an increase of cost with the increase of the data size requiring algorithm rethinking and code refactoring [20]. Portability of applications and data among service providers. Complicated pricing models make pricing difficult to assess and monitor. Quality of Service (QoS) assurance. 5. Tools for Big Data Many useful tools already exist for Big Data. In this paper, these are classified using a criterion suitable for Big Data. It is worth noting that OpenCrowd [29] maintains a Cloud Taxonomy of some of these tools, but provides a more general classification and ignores some very important tools. Our proposed classes are provided in the following sub-sections: 5.1 High-Performance Infrastructure as a Service Infrastructure as a Service tools can be used for deploying and running arbitrary software including operating systems and applications. Big Data requires highperformance Infrastructure as a Service tools for running its compute-intensive applications and processing tremendous amounts of data. These include: IBM Softlayer [30]: Softlayer Bare Metal Servers offer exceptional performance and storage capacity with speed, power, and flexibility needed for Big Data applications.

5 ProfitBricks [31]: It provides high performance IaaS suitable for Big Data applications. Amazon EC2 [32]: High Performance Computing (HPC) required for Big Data is enabled via Cluster Compute or Cluster GPU servers in Amazon Web Services (AWS) cloud [33]. 5.2 Storage as a Service Big Data applications require huge storage capacity for tremendous amounts of data. Many tools are suitable for this purpose. They include: Amazon Elastic Block Store (Amazon EBS) [34]: It provides block-level storage to be used with Amazon Elastic Compute Cloud (Amazon EC2) in the AWS Cloud. Amazon S3 [35]: It provides a simple interface for the ubiquitous storage and retrieval of any amount of data on the Web. AT&T Synaptic Storage [36]: It provides an elastic capacity allows ubiquitous access to data via an application program interface (API). Google BigTable [37]: It provides storage for applications utilizing Google Platform as a Service tools provided by Google App Engine discussed in the following sub-sections. HP Cloud Object Store [38]: It allows customers to create an unlimited number of containers with an unlimited number of objects on high-performance HP servers. Internap Cloud Storage [39]: This is an object-storage system located in highavailability secure data centers and designed to scale to millions of objects. Zetta [40]: It offers a complete server backup solution. 5.3 Big Data as a Service Platform as a Service tools can be used for deploying and running applications created using provided programming languages, libraries, services, and tools. Big Data as a Service tools are considered a subcategory of Platform as a Service tools specific for Big Data. A common prominent platform for processing Big Data is Apache Hadoop [41], which is an opensource platform with libraries and utilities for the storage and processing of Big Data. It utilizes the MapReduce algorithm for the distribution of data among processing nodes. Big Data as a Service tools include: Actian DataCloud [42]: A platform that allows the development of integration and management solutions for data and applications of any size. Altiscale [43]: It provides Hadoop as a Service. Amazon Kinesis [44]: It allows developing applications that respond to changes in streaming Big Data with few lines of code. BigML [45]: It is a cloud-based machinelearning platform that allows the development of predictions for online Big Data. BigML PredictServer [46] is a dedicated cloud image that can be used to develop blazingly fast predictions. Datameer [47]: It is a platform for Hadoop with pre-built functionalities, but can be extended by plug-ins and open APIs. Mortar Data [48]: It provides solutions, code, and tools for high-scale data science. It has been exploited by several customers such as Associated Press [49]. Qubole [50]: It offers several tools including Hadoop MapReduce for complete Big Data service. Thus, it has been exploited by 50 customers such as NextDoor [51]. Cloudera [52]: A plaform for Hadoop running on the AWS cloud. MapR [53]: A platform based on Hadoop to allow customers to easily store and process Big Data. It has been adopted by a large number of partners and customers including Google and Amazon.

6 Pig [54]: It is a high-level programming platform for creating MapReduce programs used with Hadoop. Hadoop-BAM [55]: It is a library that acts as an integration layer between analysis applications and sequencing data that are processed using Hadoop in computational biology. 5.4 Data as a Service Da ta as a Service tools provide data needed for specific applications. Such services are especially-needed for Big Data applications since collection of large datasets is not an easy task. Data as a Service tools include: AWS Public Datasets [56]: It offers sets of data from eight different domains BrightPlanet [57]: It offers data from selected sites on the Web. 5.5 Data Stores as a Service Big Data cannot be efficiently manipulated using traditional relational database management systems that utilize SQL queries for data management. Thus, about fifty NoSQL data stores [58] have been proposed and developed specifically for Big Data to achieve both speed up and elasticity. These data stores can be broadly classified into: Key-Value Stores: They are the simplest NoSQL data stores that store pairs of keys and values and retrieve values based on the keys. They can also sort the keys to enable range queries and ordered processing of keys. They are fast and can easily scale up with data size with huge changes per second with millions of simultaneous users in case of online, gaming, and mobile applications [59]. Example tools include Redis cloud [60] and Amazon Dynamo DB [61]. Document Stores: They pair each key with a document, which is a complex data structure that can contain different keyvalue pairs, key-array pairs, and nested documents. They are suitable for storing unstructured data, such as social media posts and multimedia. Example tools include MongoDB [62] and CouchDB [63]. Column Stores: They store columns rather than rows of data. They are suitable for business intelligence applications and data warehouses when new values of a column are supplied for all rows at once. Example tools include: Cassandra [64] and Google BigQuery [65]. Graph Stores: They are used to store network data such as social connections. Example tools include: Neo4j [66] and Google Horton [67]. SpliceMachine [68] is claimed to be the only Hadoop RDMS allowing scaling up on larger servers and scaling out horizontally. It can support computational biology by handling huge amounts of data such as genomic data. 5.6 Software as a Service A set of Software as a Service tools are developed and being developed to aid in the processing of Big Data. These include: Plex [69]: It is a Software as a Service (SaaS) ERP for connecting and managing an entire manufacturing process Opani [70]: It is a Software as a Service tool for the analysis of Big Data such as MRI images, Microscope images of cancer cells and MySQL databases. It has been adopted for Facebook status updates, Twitter and Yahoo Finance. Many Software as a Service tools have been developed specifically for processing biological Big Data such as sequence analysis, alignment and mapping. These tools may be classified as Biology as a Service tools and include ArrayExpressHTS [71], BGI [72], Bioscope [73], CloudAligner [74], Cloud BioLinux [75], CloudBurst [76], Cloud-Coffee [77], Cloud-MAQ [78], CloVR [79], Crossbow [80], Eoulsan [81, 82], FX [83], Jnomics [84], Myrna [85], PeakRanger [86], SEAL [87], SeqWare[88],

7 YunBe [89], and VAT [90]. It is worth noting that some of these tools can be further classified according to their specific tasks [91, 92, 93]. 6. Conclusions Though traditional storage and computing solutions cannot meet the requirements of Big Data applications, cloud computing is a promising candidate for this purpose. Cloud computing has several inherent capabilities that offer real opportunities for Big Data. These include: scalability, elasticity, metered pay-per-use capability, sharing, data reliability, Big Data paradigms, in addition to easier maintenance and upgrade. On the other side, there are many challenges such as security and privacy issues, relatively slow Internet connections, the performance of Big Data paradigms in case of extremely large data sizes, the complicated pricing models, quality of service assurance, in addition to the portability of applications and data among different service providers. In the literature, a large number of tools already exist for several different types of Big Data applications and these have been surveyed and discussed in the paper. They are classified using a criterion suitable for Big Data and example applications that have already benefited from cloud capabilities are provided. References [1] onal_biology; [2] [3] nome_sequencing; [4] s; [5] onal_neuroscience; accessed July [6] accessed July [7] Lein E. et al., "Genome-Wide Atlas of Gene Expression in the Adult Mouse Brain," Nature 445 (pp , 2007). [8] accessed July [9] mics; [10] dron_collider; [11] [12] [13] ital_sky_survey; [14] accessed July [15] accessed July [16] accessed July [17] accessed July [18] [19] Issenberg S., "How President Obama s campaign used big data to rally individual voters, Part 1.," uredstory/508836/how-obama-usedbig-data-to-rally-voters-part-1/; [20] Kasson P., "Computational Biology in the Cloud: Methods and New Insights from Computing at Scale, Proc. Pac Symp Biocomputing (pp , 2013). [21] cfm; [22] Mell P. and Grance T., "The NIST Definition of Cloud Computing," Special Publication , National Institute of Standards and Technology (NIST), U.S. Department of Commerce (2011).

8 [23] Daconta M., Why NIST's Cloud Definition is Fatally Flawed, ity-check-nist-flawed-cloudframework.aspx; [24] Chou Y., An Inconvenient Truth of the NIST Definition of Cloud Computing," accessed July [25] (cloud_computing); accessed July [26] Dean, J. and Ghemawat S., "MapReduce: Simplified Data Processing on Large Clusters," Communications of the ACM, 51(1) (pp , 2008). [27] Dean J. and Ghemawat S., "MapReduce: A Flexible Data Processing Tool," Communications of the ACM 53(1) (pp , 2010). [28] Melnik S. et al., "Dremel: Interactive Analysis of Web-Scale Datasets," Communications of the ACM 54(6) (p , 2011). [29] [30] [31] accessed July [32] accessed July [33] accessed July [34] accessed July [35] accessed July [36] r/html/productdetail/storage_as_a_ser vice.htm; [37] [38] accessed July [39] [40] accessed July [41] adoop; [42] accessed July [43] accessed July [44] [45] [46] [47] accessed July [48] accessed July [49] accessed July [50] accessed July [51] accessed July [52] dera/en/solutions/partner/amazon- Web-Services.html; accessed July [53] t-overview/overview; accessed July [54] accessed July [55] Niemenmaa M. et al., " Hadoop-BAM: Directly Manipulating Next Generation Sequencing Data in the Cloud," Bioinformatics 28(6) (pp , 2012). [56]

9 [57] data-as-a-service/; [58] [59] g-data-architectures-nosql-use-casesfor-key-value-databases/; accessed July [60] [61] [62] [63] accessed July [64] accessed July [65] uery/; [66] accessed July [67] [68] [69] accessed July [70] accessed July [71] Goncalves A. et al., "A Pipeline for RNA-Seq Data Processing and Quality Assessment," Bioinformatics 27(6) (pp , 2011). [72] [73] /home/life-science.html; accessed July [74] Nguyen T. et al., "CloudAligner: A Fast and Full-Featured MapReduce Based Tool for Sequence Mapping," BMC Research Notes 4(171) (2011). [75] accessed July [76] Schatz M., "CloudBurst: Highly Sensitive Read Mapping with MapReduce," Bioinformatics 25(11) (pp , 2009). [77] Tommaso P. et al., "Cloud-Coffee: Implementation of a Parallel Consistency-Based Multiple Alignment Algorithm in the T-coffee Package and its Benchmarking on the Amazon Elastic-Cloud," Bioinformatics 26(15) (pp , 2010). [78] Talukder A et al., "Cloud-MAQ: The Cloud-Enabled Scalable Whole Genome Reference Assembly Application," Proc. the 7th International Conference on Wireless And Optical Communications Networks (pp. 1-5, 2010). [79] [80] Langmead B. et al. "Searching for SNPs with cloud computing," Genome Biology 10(11) (2009). [81] [82] Jourdren L et al., "Eoulsan: A Cloud Computing-Based Framework Facilitating High Throughput Sequencing Analyses.," Bioinformatics 28(11) (pp , 2012). [83] Hong D. et al., "FX: An RNA-Seq Analysis Tool on the Cloud," Bioinformatics 28(5) (pp , 2012). [84] ce-analysis/10943; [85] Langmead B. et al., "Cloud-Scale RNA-Sequencing Differential Expression Analysis with Myrna," Genome Biology 11(R83) (2010). [86] Feng X., "PeakRanger: A cloud- Enabled Peak Caller for ChIP-Seq Data," Bioinformatics 12(139) (2011). [87] Pireddu L. et al. "Seal: A Distributed Short Read Mapping and Duplicate Removal Tool," Bioinformatics 27(15) (pp , 2011). [88] O Connor B. et al., "SeqWare Query Engine: Storing and Searching Sequence Data in the Cloud," BMC

10 Bioinformatics 11(Suppl 12:S2) (2010). [89] Zhang L. et al., "Gene set analysis in the cloud," Bioinformatics (2011). [90] Habegger, L., "VAT: A Computational Framework to Functionally Annotate Variants in Personal Genomes within a Cloud-Computing Environment," Bioinformatics 28(17) (pp , 2012). [91] Lin Y., Yu C. and Lin Y., "Enabling Large-Scale Biomedical Analysis in the Cloud," BioMed Research International, 2013(185679) (2013). [92] Dai L. et al., "Bioinformatics Clouds for Big Data Manipulation," Biology Direct 7(43) (2012). [93] Chen J. et al., "Translational Biomedical Informatics in the Cloud: Present and Future," BioMed Research International, 2013(658925) (2013).

Cloud-Based Big Data Analytics in Bioinformatics

Cloud-Based Big Data Analytics in Bioinformatics Cloud-Based Big Data Analytics in Bioinformatics Presented By Cephas Mawere Harare Institute of Technology, Zimbabwe 1 Introduction 2 Big Data Analytics Big Data are a collection of data sets so large

More information

The NIST Definition of Cloud Computing

The NIST Definition of Cloud Computing Special Publication 800-145 The NIST Definition of Cloud Computing Recommendations of the National Institute of Standards and Technology Peter Mell Timothy Grance NIST Special Publication 800-145 The NIST

More information

See Appendix A for the complete definition which includes the five essential characteristics, three service models, and four deployment models.

See Appendix A for the complete definition which includes the five essential characteristics, three service models, and four deployment models. Cloud Strategy Information Systems and Technology Bruce Campbell What is the Cloud? From http://csrc.nist.gov/publications/nistpubs/800-145/sp800-145.pdf Cloud computing is a model for enabling ubiquitous,

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Discovery 2015: Cloud Computing Workshop June 20-24, 2011 Berkeley, CA Introduction to Cloud Computing Keith R. Jackson Lawrence Berkeley National Lab What is it? NIST Definition Cloud computing is a model

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 1 PDA College of Engineering, Gulbarga, Karnataka, India rlrooparl@gmail.com 2 PDA College of Engineering, Gulbarga, Karnataka,

More information

Cloud Computing Services and its Application

Cloud Computing Services and its Application Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 4, Number 1 (2014), pp. 107-112 Research India Publications http://www.ripublication.com/aeee.htm Cloud Computing Services and its

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop André Schumacher, Luca Pireddu, Matti Niemenmaa, Aleksi Kallio, Eija Korpelainen, Gianluigi Zanetti and Keijo Heljanko Abstract

More information

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL *Hung-Ming Chen, Chuan-Chien Hou, and Tsung-Hsi Lin Department of Construction Engineering National Taiwan University

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Cloud-based Analytics and Map Reduce

Cloud-based Analytics and Map Reduce 1 Cloud-based Analytics and Map Reduce Datasets Many technologies converging around Big Data theme Cloud Computing, NoSQL, Graph Analytics Biology is becoming increasingly data intensive Sequencing, imaging,

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Cloud Computing Now and the Future Development of the IaaS

Cloud Computing Now and the Future Development of the IaaS 2010 Cloud Computing Now and the Future Development of the IaaS Quanta Computer Division: CCASD Title: Project Manager Name: Chad Lin Agenda: What is Cloud Computing? Public, Private and Hybrid Cloud.

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Cloud Courses Description

Cloud Courses Description Cloud Courses Description Cloud 101: Fundamental Cloud Computing and Architecture Cloud Computing Concepts and Models. Fundamental Cloud Architecture. Virtualization Basics. Cloud platforms: IaaS, PaaS,

More information

Tamanna Roy Rayat & Bahra Institute of Engineering & Technology, Punjab, India talk2tamanna@gmail.com

Tamanna Roy Rayat & Bahra Institute of Engineering & Technology, Punjab, India talk2tamanna@gmail.com IJCSIT, Volume 1, Issue 5 (October, 2014) e-issn: 1694-2329 p-issn: 1694-2345 A STUDY OF CLOUD COMPUTING MODELS AND ITS FUTURE Tamanna Roy Rayat & Bahra Institute of Engineering & Technology, Punjab, India

More information

How To Compare The Two Cloud Computing Models

How To Compare The Two Cloud Computing Models WHITE PAPER Elastic Cloud Infrastructure: Agile, Efficient and Under Your Control - 1 - INTRODUCTION Most businesses want to spend less time and money building and managing infrastructure to focus resources

More information

A Study on Analysis and Implementation of a Cloud Computing Framework for Multimedia Convergence Services

A Study on Analysis and Implementation of a Cloud Computing Framework for Multimedia Convergence Services A Study on Analysis and Implementation of a Cloud Computing Framework for Multimedia Convergence Services Ronnie D. Caytiles and Byungjoo Park * Department of Multimedia Engineering, Hannam University

More information

Cloud 101. Mike Gangl, Caltech/JPL, michael.e.gangl@jpl.nasa.gov 2015 California Institute of Technology. Government sponsorship acknowledged

Cloud 101. Mike Gangl, Caltech/JPL, michael.e.gangl@jpl.nasa.gov 2015 California Institute of Technology. Government sponsorship acknowledged Cloud 101 Mike Gangl, Caltech/JPL, michael.e.gangl@jpl.nasa.gov 2015 California Institute of Technology. Government sponsorship acknowledged Outline What is cloud computing? Cloud service models Deployment

More information

Cloud Computing Paradigm

Cloud Computing Paradigm Cloud Computing Paradigm Julio Guijarro Automated Infrastructure Lab HP Labs Bristol, UK 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

More information

Elastic Private Clouds

Elastic Private Clouds White Paper Elastic Private Clouds Agile, Efficient and Under Your Control 1 Introduction Most businesses want to spend less time and money building and managing IT infrastructure to focus resources on

More information

Managing and Conducting Biomedical Research on the Cloud Prasad Patil

Managing and Conducting Biomedical Research on the Cloud Prasad Patil Managing and Conducting Biomedical Research on the Cloud Prasad Patil Laboratory for Personalized Medicine Center for Biomedical Informatics Harvard Medical School SaaS & PaaS gmail google docs app engine

More information

Sriram Krishnan, Ph.D. sriram@sdsc.edu

Sriram Krishnan, Ph.D. sriram@sdsc.edu Sriram Krishnan, Ph.D. sriram@sdsc.edu (Re-)Introduction to cloud computing Introduction to the MapReduce and Hadoop Distributed File System Programming model Examples of MapReduce Where/how to run MapReduce

More information

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

Cloud Computing: The Next Computing Paradigm

Cloud Computing: The Next Computing Paradigm Cloud Computing: The Next Computing Paradigm Ronnie D. Caytiles 1, Sunguk Lee and Byungjoo Park 1 * 1 Department of Multimedia Engineering, Hannam University 133 Ojeongdong, Daeduk-gu, Daejeon, Korea rdcaytiles@gmail.com,

More information

What is Analytic Infrastructure and Why Should You Care?

What is Analytic Infrastructure and Why Should You Care? What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group grossman@uic.edu ABSTRACT We define analytic infrastructure to be the services,

More information

The Trend and Challenges of Cloud Computing: A Literature Review

The Trend and Challenges of Cloud Computing: A Literature Review The Trend and Challenges of Cloud Computing: A Literature Review Doi:10.5901/ajis.2013.v2n10p9 Abstract Evwiekpaefe, Abraham E. Department of Mathematics, Computer Science Nigerian Defence Academy, Kaduna,

More information

Cloud computing doesn t yet have a

Cloud computing doesn t yet have a The Case for Cloud Computing Robert L. Grossman University of Illinois at Chicago and Open Data Group To understand clouds and cloud computing, we must first understand the two different types of clouds.

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

Large-Scale Data Processing

Large-Scale Data Processing Large-Scale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase

More information

A programming model in Cloud: MapReduce

A programming model in Cloud: MapReduce A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value

More information

Cloud computing - Architecting in the cloud

Cloud computing - Architecting in the cloud Cloud computing - Architecting in the cloud anna.ruokonen@tut.fi 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices

More information

Big Data and the Cloud Trends, Applications, and Training

Big Data and the Cloud Trends, Applications, and Training Big Data and the Cloud Trends, Applications, and Training Stavros Christodoulakis MUSIC/TUC Lab School of Electronic and Computer Engineering Technical University of Crete stavros@ced.tuc.gr Data Explosion

More information

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos Research Challenges Overview May 3, 2010 Table of Contents I 1 What Is It? Related Technologies Grid Computing Virtualization Utility Computing Autonomic Computing Is It New? Definition 2 Business Business

More information

Cloud Computing Training

Cloud Computing Training Cloud Computing Training TechAge Labs Pvt. Ltd. Address : C-46, GF, Sector 2, Noida Phone 1 : 0120-4540894 Phone 2 : 0120-6495333 TechAge Labs 2014 version 1.0 Cloud Computing Training Cloud Computing

More information

Viswanath Nandigam Sriram Krishnan Chaitan Baru

Viswanath Nandigam Sriram Krishnan Chaitan Baru Viswanath Nandigam Sriram Krishnan Chaitan Baru Traditional Database Implementations for large-scale spatial data Data Partitioning Spatial Extensions Pros and Cons Cloud Computing Introduction Relevance

More information

Big Data on Cloud Computing- Security Issues

Big Data on Cloud Computing- Security Issues Big Data on Cloud Computing- Security Issues K Subashini, K Srivaishnavi UG Student, Department of CSE, University College of Engineering, Kanchipuram, Tamilnadu, India ABSTRACT: Cloud computing is now

More information

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

CLOUD COMPUTING USING HADOOP TECHNOLOGY

CLOUD COMPUTING USING HADOOP TECHNOLOGY CLOUD COMPUTING USING HADOOP TECHNOLOGY DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY SALEM B.NARENDRA PRASATH S.PRAVEEN KUMAR 3 rd year CSE Department, 3 rd year CSE Department, Email:narendren.jbk@gmail.com

More information

Cloud-Based Big Data Analytics in Bioinformatics: A Review

Cloud-Based Big Data Analytics in Bioinformatics: A Review Cloud-Based Big Data Analytics in Bioinformatics: A Review Cephas MAWERE 1, Kudakwashe ZVAREVASHE 2, Thamari SENGUDZWA 3, Tendai PADENGA 4 1 Harare Institute of Technology, School of Industrial Sciences

More information

From Internet Data Centers to Data Centers in the Cloud

From Internet Data Centers to Data Centers in the Cloud From Internet Data Centers to Data Centers in the Cloud This case study is a short extract from a keynote address given to the Doctoral Symposium at Middleware 2009 by Lucy Cherkasova of HP Research Labs

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

An Introduction to Cloud Computing Concepts

An Introduction to Cloud Computing Concepts Software Engineering Competence Center TUTORIAL An Introduction to Cloud Computing Concepts Practical Steps for Using Amazon EC2 IaaS Technology Ahmed Mohamed Gamaleldin Senior R&D Engineer-SECC ahmed.gamal.eldin@itida.gov.eg

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS

INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS CLOUD COMPUTING Cloud computing is a model for enabling convenient, ondemand network access to a shared pool of configurable computing

More information

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08

More information

The NIST Definition of Cloud Computing (Draft)

The NIST Definition of Cloud Computing (Draft) Special Publication 800-145 (Draft) The NIST Definition of Cloud Computing (Draft) Recommendations of the National Institute of Standards and Technology Peter Mell Timothy Grance NIST Special Publication

More information

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD Building Out Your Cloud-Ready Solutions Clark D. Richey, Jr., Principal Technologist, DoD Slide 1 Agenda Define the problem Explore important aspects of Cloud deployments Wrap up and questions Slide 2

More information

INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Hadoop-BAM and SeqPig

Hadoop-BAM and SeqPig Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3 1 Department of Computer

More information

Grid Computing Vs. Cloud Computing

Grid Computing Vs. Cloud Computing International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 577-582 International Research Publications House http://www. irphouse.com /ijict.htm Grid

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

SCADA Cloud Computing

SCADA Cloud Computing SCADA Cloud Computing Information on Cloud Computing with SCADA systems Version: 1.0 Erik Daalder, Business Development Manager Yokogawa Electric Corporation Global SCADA Center T: +31 88 4641 360 E: erik.daalder@nl.yokogawa.com

More information

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL

More information

yvette@yvetteagostini.it yvette@yvetteagostini.it

yvette@yvetteagostini.it yvette@yvetteagostini.it 1 The following is merely a collection of notes taken during works, study and just-for-fun activities No copyright infringements intended: all sources are duly listed at the end of the document This work

More information

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

Coding Techniques for Efficient, Reliable Networked Distributed Storage in Data Centers

Coding Techniques for Efficient, Reliable Networked Distributed Storage in Data Centers Coding Techniques for Efficient, Reliable Networked Distributed Storage in Data Centers Anwitaman Datta Joint work with Frédérique Oggier (SPMS) School of Computer Engineering Nanyang Technological University

More information

Hadoop. Bioinformatics Big Data

Hadoop. Bioinformatics Big Data Hadoop Bioinformatics Big Data Paolo D Onorio De Meo Mattia D Antonio p.donoriodemeo@cineca.it m.dantonio@cineca.it Big Data Too much information! Big Data Explosive data growth proliferation of data capture

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

How to Do/Evaluate Cloud Computing Research. Young Choon Lee

How to Do/Evaluate Cloud Computing Research. Young Choon Lee How to Do/Evaluate Cloud Computing Research Young Choon Lee Cloud Computing Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing

More information

Cloud Computing and Big Data What Technical Writers Need to Know

Cloud Computing and Big Data What Technical Writers Need to Know Cloud Computing and Big Data What Technical Writers Need to Know Greg Olson, Senior Director Black Duck Software For the Society of Technical Writers Berkeley Chapter Black Duck 2014 Agenda Introduction

More information

Study concluded that success rate for penetration from outside threats higher in corporate data centers

Study concluded that success rate for penetration from outside threats higher in corporate data centers Auditing in the cloud Ownership of data Historically, with the company Company responsible to secure data Firewall, infrastructure hardening, database security Auditing Performed on site by inspecting

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

Cloud Computing Benefits for Educational Institutions

Cloud Computing Benefits for Educational Institutions Cloud Computing Benefits for Educational Institutions ABSTRACT Mr. Ramkumar Lakshminarayanan 1, Dr. Binod Kumar 2, Mr. M. Raju 3 Higher College of Technology, Muscat, Oman rajaramcomputers@gmail.com 1,

More information

A Study on the Cloud Computing Architecture, Service Models, Applications and Challenging Issues

A Study on the Cloud Computing Architecture, Service Models, Applications and Challenging Issues A Study on the Cloud Computing Architecture, Service Models, Applications and Challenging Issues Rajbir Singh 1, Vivek Sharma 2 1, 2 Assistant Professor, Rayat Institute of Engineering and Information

More information

Enhancing Operational Capacities and Capabilities through Cloud Technologies

Enhancing Operational Capacities and Capabilities through Cloud Technologies Enhancing Operational Capacities and Capabilities through Cloud Technologies How freight forwarders and other logistics stakeholders can benefit from cloud-based solutions 2013 vcargo Cloud Pte Ltd All

More information

Cloud Computing 159.735. Submitted By : Fahim Ilyas (08497461) Submitted To : Martin Johnson Submitted On: 31 st May, 2009

Cloud Computing 159.735. Submitted By : Fahim Ilyas (08497461) Submitted To : Martin Johnson Submitted On: 31 st May, 2009 Cloud Computing 159.735 Submitted By : Fahim Ilyas (08497461) Submitted To : Martin Johnson Submitted On: 31 st May, 2009 Table of Contents Introduction... 3 What is Cloud Computing?... 3 Key Characteristics...

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

Hadoopizer : a cloud environment for bioinformatics data analysis

Hadoopizer : a cloud environment for bioinformatics data analysis Hadoopizer : a cloud environment for bioinformatics data analysis Anthony Bretaudeau (1), Olivier Sallou (2), Olivier Collin (3) (1) anthony.bretaudeau@irisa.fr, INRIA/Irisa, Campus de Beaulieu, 35042,

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Managing Cloud Computing Risk

Managing Cloud Computing Risk Managing Cloud Computing Risk Presented By: Dan Desko; Manager, Internal IT Audit & Risk Advisory Services Schneider Downs & Co. Inc. ddesko@schneiderdowns.com Learning Objectives Understand how to identify

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul

More information

SURVEY OF ADAPTING CLOUD COMPUTING IN HEALTHCARE

SURVEY OF ADAPTING CLOUD COMPUTING IN HEALTHCARE SURVEY OF ADAPTING CLOUD COMPUTING IN HEALTHCARE H.Madhusudhana Rao* Md. Rahmathulla** Dr. B Rambhupal Reddy*** Abstract: This paper targets on the productivity of cloud computing technology in healthcare

More information

CHAPTER 8 CLOUD COMPUTING

CHAPTER 8 CLOUD COMPUTING CHAPTER 8 CLOUD COMPUTING SE 458 SERVICE ORIENTED ARCHITECTURE Assist. Prof. Dr. Volkan TUNALI Faculty of Engineering and Natural Sciences / Maltepe University Topics 2 Cloud Computing Essential Characteristics

More information

Real Time Big Data Processing

Real Time Big Data Processing Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

Kent State University s Cloud Strategy

Kent State University s Cloud Strategy Kent State University s Cloud Strategy Table of Contents Item Page 1. From the CIO 3 2. Strategic Direction for Cloud Computing at Kent State 4 3. Cloud Computing at Kent State University 5 4. Methodology

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

The little elephant driving Big Data

The little elephant driving Big Data The little elephant driving Big Data Despite the funny-sounding name, Hadoop is a serious enterprise software suite that drives Big Data Hadoop enables the storage and processing of very large databases

More information

Cloud Computing For Distributed University Campus: A Prototype Suggestion

Cloud Computing For Distributed University Campus: A Prototype Suggestion Cloud Computing For Distributed University Campus: A Prototype Suggestion Mehmet Fatih Erkoç, Serhat Bahadir Kert mferkoc@yildiz.edu.tr, sbkert@yildiz.edu.tr Yildiz Technical University (Turkey) Abstract

More information

Student's Awareness of Cloud Computing: Case Study Faculty of Engineering at Aden University, Yemen

Student's Awareness of Cloud Computing: Case Study Faculty of Engineering at Aden University, Yemen Student's Awareness of Cloud Computing: Case Study Faculty of Engineering at Aden University, Yemen Samah Sadeq Ahmed Bagish Department of Information Technology, Faculty of Engineering, Aden University,

More information

Big Systems, Big Data

Big Systems, Big Data Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,

More information

Sistemi Operativi e Reti. Cloud Computing

Sistemi Operativi e Reti. Cloud Computing 1 Sistemi Operativi e Reti Cloud Computing Facoltà di Scienze Matematiche Fisiche e Naturali Corso di Laurea Magistrale in Informatica Osvaldo Gervasi ogervasi@computer.org 2 Introduction Technologies

More information

Figure 1 Cloud Computing. 1.What is Cloud: Clouds are of specific commercial interest not just on the acquiring tendency to outsource IT

Figure 1 Cloud Computing. 1.What is Cloud: Clouds are of specific commercial interest not just on the acquiring tendency to outsource IT An Overview Of Future Impact Of Cloud Computing Shiva Chaudhry COMPUTER SCIENCE DEPARTMENT IFTM UNIVERSITY MORADABAD Abstraction: The concept of cloud computing has broadcast quickly by the information

More information

Data Mining in the Swamp

Data Mining in the Swamp WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS WHAT IS BIG DATA? describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information

More information

Cloud Computing. What s the Big Deal? Michael J. Carey Information Systems Group CS Department UC Irvine

Cloud Computing. What s the Big Deal? Michael J. Carey Information Systems Group CS Department UC Irvine Cloud Computing and Big Data: What s the Big Deal? Michael J. Carey Information Systems Group CS Department UC Irvine What Is Cloud Computing? Cloud computing is a model for enabling ubiquitous, convenient,

More information

Planning the Migration of Enterprise Applications to the Cloud

Planning the Migration of Enterprise Applications to the Cloud Planning the Migration of Enterprise Applications to the Cloud A Guide to Your Migration Options: Private and Public Clouds, Application Evaluation Criteria, and Application Migration Best Practices Introduction

More information

Amazon Web Services. 18.11.2015 Yu Xiao

Amazon Web Services. 18.11.2015 Yu Xiao Amazon Web Services 18.11.2015 Yu Xiao Agenda Introduction to Amazon Web Services(AWS) 7 Steps to Select the Right Architecture for Your Web Applications Private, Public or Hybrid Cloud? AWS Case Study

More information

Chapter 19 Cloud Computing for Multimedia Services

Chapter 19 Cloud Computing for Multimedia Services Chapter 19 Cloud Computing for Multimedia Services 19.1 Cloud Computing Overview 19.2 Multimedia Cloud Computing 19.3 Cloud-Assisted Media Sharing 19.4 Computation Offloading for Multimedia Services 19.5

More information

What is Cloud Computing? Tackling the Challenges of Big Data. Tackling The Challenges of Big Data. Matei Zaharia. Matei Zaharia. Big Data Collection

What is Cloud Computing? Tackling the Challenges of Big Data. Tackling The Challenges of Big Data. Matei Zaharia. Matei Zaharia. Big Data Collection Introduction What is Cloud Computing? Cloud computing means computing resources available on demand Resources can include storage, compute cycles, or software built on top (e.g. database as a service)

More information

Next-Generation Cloud Analytics with Amazon Redshift

Next-Generation Cloud Analytics with Amazon Redshift Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional

More information