CLOUD COMPUTING USING HADOOP TECHNOLOGY

Similar documents
Hadoop/MapReduce. Object-oriented framework presentation CSCI 5448 Casey McTaggart

Introduction to Cloud Computing

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Hadoop Architecture. Part 1

NoSQL and Hadoop Technologies On Oracle Cloud

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Cloud Computing Technology

Open source Google-style large scale data analysis with Hadoop

Apache Hadoop. Alexandru Costan

ISSN: (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Big Data on Cloud Computing- Security Issues

Hadoop & its Usage at Facebook

Cloud Computing Paradigm

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Hadoop & its Usage at Facebook

Chapter 7. Using Hadoop Cluster and MapReduce

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

How To Handle Big Data With A Data Scientist

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Distributed File Systems

Cloud Computing Flying High (or not) Ben Roper IT Director City of College Station

Cloud Computing Architecture: A Survey

Hadoop IST 734 SS CHUNG

High Performance Computing Cloud Computing. Dr. Rami YARED

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Application Development. A Paradigm Shift

A Study of Infrastructure Clouds

Cloud Computing. Karan Saxena * & Kritika Agarwal**

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

CLOUD COMPUTING An Overview

What is Cloud Computing? Tackling the Challenges of Big Data. Tackling The Challenges of Big Data. Matei Zaharia. Matei Zaharia. Big Data Collection

Overview. The Cloud. Characteristics and usage of the cloud Realities and risks of the cloud

How To Understand Cloud Computing

Sriram Krishnan, Ph.D.

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Cloud Courses Description

Cloud Computing. Chapter 1 Introducing Cloud Computing

Viswanath Nandigam Sriram Krishnan Chaitan Baru

CSE-E5430 Scalable Cloud Computing Lecture 2

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

The Trend and Challenges of Cloud Computing: A Literature Review

Hadoop Mapreduce Framework in Big Data Analytics Vidyullatha Pellakuri 1, Dr.D. Rajeswara Rao 2

Introduction to Cloud Computing

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Cloud Computing Now and the Future Development of the IaaS

Data Centers and Cloud Computing. Data Centers

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos

Cloud Computing. Adam Barker

Cloud Computing Summary and Preparation for Examination

Where in the Cloud are You? Session Thursday, March 5, 2015: 1:45 PM-2:45 PM Virginia (Sheraton Seattle)

From Wikipedia, the free encyclopedia

Big Data With Hadoop

BIG DATA TRENDS AND TECHNOLOGIES

Data Centers and Cloud Computing. Data Centers. MGHPCC Data Center. Inside a Data Center

IJRSET 2015 SPL Volume 2, Issue 11 Pages: 29-33

Manifest for Big Data Pig, Hive & Jaql

Cloud Computing; What is it, How long has it been here, and Where is it going?

A Hotel in the Cloud. Bruno Albietz

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Hadoop and its Usage at Facebook. Dhruba Borthakur June 22 rd, 2009

Topics. Images courtesy of Majd F. Sakr or from Wikipedia unless otherwise noted.

Hadoop implementation of MapReduce computational model. Ján Vaňo

Cloud Courses Description

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data?

Certified Cloud Computing Professional VS-1067

Cloud Computing Submitted By : Fahim Ilyas ( ) Submitted To : Martin Johnson Submitted On: 31 st May, 2009

Large scale processing using Hadoop. Ján Vaňo

Outline. What is cloud computing? History Cloud service models Cloud deployment forms Advantages/disadvantages

Cloud Computing INTRODUCTION

CHAPTER 8 CLOUD COMPUTING

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

CLOUD STORAGE USING HADOOP AND PLAY

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

An Overview on Important Aspects of Cloud Computing

Apache Hadoop FileSystem and its Usage in Facebook

White Paper on CLOUD COMPUTING

Cloud Computing: Making the right choices

BIG DATA USING HADOOP

White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP

BIG DATA TECHNOLOGY. Hadoop Ecosystem

CHECKLIST FOR THE CLOUD ADOPTION IN THE PUBLIC SECTOR

Cloud computing - Architecting in the cloud

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Big Data Analytics: Hadoop-Map Reduce & NoSQL Databases

The trend and challenges of cloud computing: a literature review

CLOUD COMPUTING. When It's smarter to rent than to buy

Figure 1 Cloud Computing. 1.What is Cloud: Clouds are of specific commercial interest not just on the acquiring tendency to outsource IT

Performance Evaluation for BlobSeer and Hadoop using Machine Learning Algorithms

BUSINESS MANAGEMENT SUPPORT

Transcription:

CLOUD COMPUTING USING HADOOP TECHNOLOGY DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY SALEM B.NARENDRA PRASATH S.PRAVEEN KUMAR 3 rd year CSE Department, 3 rd year CSE Department, Email:narendren.jbk@gmail.com Email:praveencse333@gmail.com Abstract: Cloud computing offers a powerful abstraction that provides a scalable, virtualized infrastructure as a service where the complexity of fine-grained resource management is hidden from the end-user. Running data analytics applications in the cloud on extremely large data sets is gaining traction as the underlying infrastructure can meet the extreme demands of scalability. We introduce the concept of how the data are stored and processing in cloud computing using Big data applications of Hadoop technology. Keywords: Cloud computing, Big data, Hadoop,Map reduce and HDFC Introduction to Cloud Computing When you store your photos online instead of on your home computer, or use webmail or a social networking site, you are using a cloud computing service. If you are an organization, and you want to use, for example, an online invoicing service instead of updating the in-house one you have been using for many years, that online invoicing service is a cloud computing service. Cloud computing refers to the delivery of computing resources over the Internet. Instead of keeping data on your own hard drive or updating applications for your needs, you use a service over the Internet, at another location, to store your information or use its applications. Doing so may give rise to certain privacy implications. For that reason the Office of the Privacy Commissioner of Canada (OPC) has prepared some responses to Frequently Asked Questions (FAQs). We have also developed a Fact Sheet that provides detailed information on cloud computing and the privacy challenges it presents Cloud Computing Cloud computing is the delivery of computing services over the Internet. Cloud services allow individuals and businesses to use software and hardware that are managed by third parties at remote locations. Examples of cloud services include online file storage, social networking sites, webmail, and online business applications. The cloud computing model allows access to information and computer resources from anywhere that a network connection is available. Cloud computing provides a shared pool of resources, including data

storage space, networks, computer processing power, and specialized corporate and user applications.[1] Google App Engine in combination with Google Docs. Due this combinatorial capability, these types are also often referred to as components. The following definition of cloud computing has been developed by the U.S. National Institute of Standards and Technology (NIST): Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.[2] TYPES OF CLOUDS Cloud providers typically centre on one type of cloud functionality provisioning: Infrastructure, Platform or Software / Application, though there is potentially no restriction to offer multiple types at the same time, which can often be observed in PaaS (Platform as a Service) providers which offer specific applications too, such as Literature and publications typically differ slightly in the terminologies applied. This is mostly due to the fact that some application areas overlap and are therefore difficult to distinguish. As an example, platforms typically have to provide access to resources indirectly, and thus are sometimes confused with infrastructures. Additionally, more popular terms have been introduced in less technologically centred publications. The following list identifies the main types of clouds (currently in use): Infrastructure as a Service (IaaS) also referred to as Resource Clouds, provide (managed and scalable) resources as services to the user in other words, they basically provide enhanced virtualisation capabilities. Accordingly, different resources may be provided via a service interface: Data & Storage Clouds deal with reliable access to data of potentially dynamic size,

weighing resource usage with access requirements and / or quality definition Examples: Amazon S3, SQL Azure. Compute Clouds provide access to computational resources, i.e. CPUs. So far, such low-level resources cannot really be exploited on their own, so that they are typically exposed as part of a virtualized environment (not to be mixed with PaaS below), i.e. hypervisors. Compute Cloud Providers therefore typically offer the capability to provide computing resources (i.e. raw access to resources unlike PaaS that offer full software stacks to develop and build applications), typically virtualised, in which to execute cloudified services and applications. IaaS (Infrastructure as a Service) offers additional capabilities over a simple compute service. Examples: Amazon EC2, Zimory, Elastichosts. Platform as a Service (PaaS), provide computational resources via a platform upon which applications and services can be developed and hosted. PaaS typically makes use of dedicated APIs to control the behaviour of a server hosting engine which executes and replicates the execution according to user requests (e.g. access rate). As each provider exposes his / her own API according to the respective key capabilities, applications developed for one specific cloud provider cannot be moved to another cloud host there are however attempts to extend generic programming models with cloud capabilities (such as MS Azure). Examples: Force.com, Google App Engine, Windows Azure (Platform). Software as a Service (SaaS), also sometimes referred to as Service or Application Clouds are offering implementations of specific business functions and business processes that are provided with specific cloud capabilities, i.e. they provide applications / services using a cloud infrastructure or platform, rather than providing cloud features themselves. Often, kind of standard application software functionality is offered within a cloud. Examples: Google Docs, Salesforce CRM, SAP Business by Design. Overall, Cloud Computing is not restricted to Infrastructure / Platform / Software as a Service systems, even though it provides enhanced capabilities which act as (vertical) enablers to these systems. As such, I/P/SaaS can be considered specific usage patterns for cloud systems which relate to models already approached by Grid, Web Services etc. Cloud systems are a promising way to implement these models and extend them further. [1][2]. DEPLOYMENT TYPES Similar to P/I/SaaS, clouds may be hosted and employed in different fashions, depending on the use case, respectively the business model of the provider. So far, there has been a tendency of clouds to evolve from private, internal solutions (private clouds) to manage the local infrastructure and the amount of requests e.g. to ensure availability of highly requested data. This is due to the fact that data centres initiating

cloud capabilities made use of these features for internal purposes before considering selling the capabilities publicly (public clouds). Only now that the providers have gained confidence in publication and exposition of cloud features do the first hybrid solutions emerge. This movement from private via public to combined solutions is often considered a natural evolution of such systems, though there is no reason for providers to not start up with hybrid solutions, once the necessary technologies have reached a mature enough position. We can hence distinguish between the following deployment types: Private Clouds are typically owned by the respective enterprise and / or leased. Functionalities are not directly exposed to the customer, though in some cases services with cloud enhanced features may be offered this is similar to (Cloud) Software as a Service from the customer point of view. Example: ebay. Public Clouds. Enterprises may use cloud functionality from others, respectively offer their own services to users outside of the company. Providing the user with the actual capability to exploit the cloud features for his / her own purposes also allows other enterprises to outsource their services to such cloud providers, thus reducing costs and effort to build up their own infrastructure. As noted in the context of cloud types, the scope of functionalities thereby may differ. Example: Amazon, Google Apps, Windows Azure. Hybrid Clouds. Though public clouds allow enterprises to outsource parts of their infrastructure to cloud providers, they at the same time would lose control over the resources and the distribution / management of code and data. In some cases, this is not desired by the respective enterprise. Hybrid clouds consist of a mixed employment of private and public cloud infrastructures so as to achieve a maximum of cost reduction through outsourcing whilst maintaining the desired degree of control over e.g. sensitive data by employing local private clouds. There are not many hybrid clouds actually in use today, though initial initiatives such as the one by IBM and Juniper already introduce base technologies for their realization Community Clouds. Typically cloud systems are restricted to the local infrastructure, i.e. providers of public clouds offer their own infrastructure to customers. Though the provider could actually resell the infrastructure of another provider, clouds do not aggregate infrastructures to build up larger, cross-boundary structures. In particular smaller SMEs could profit from community clouds to which different entities contribute with their respective (smaller) infrastructure. Community clouds can either aggregate public clouds or dedicated resource infrastructures. We may thereby distinguish between private and public community clouds. For example smaller organizations may come together only to pool their resources for building a

private community cloud. As opposed to this, resellers such as Zimory may pool cloud resources from different providers and resell them. Community Clouds as such are still just a vision, though there are already indicators for such development, e.g. through Zimory and RightScale. Community clouds show some overlap with GRIDs technology (see e.g. Reservoir ). Special Purpose Clouds. In particular IaaS clouds originating from data centres have a general purpose appeal to them, as their according capabilities can be equally used for a wide scope of use cases and customer types. As opposed to this, PaaS clouds tend to provide functionalities more specialized to specific use cases, which should not be confused with proprietariness of the platform: specialization implies providing additional, use case specific methods, whilst proprietary data implies that structure of data and interface are specific to the provider. Specialized functionalities are provided e.g. by the Google App Engine which provides specific capabilities dedicated to distributed document management. Similar to general service provisioning (web based or not), it can be expected that future systems will provide even more specialized capabilities to attract individual user areas, due to competition, customer demand and available expertise. Special Purpose Clouds are just extensions of normal cloud systems to provide additional, dedicated capabilities. The basis of such development is already visible.[2] Why cloud services are popular Cloud services are popular because they can reduce the cost and complexity of owning and operating computers and networks. Since cloud users do not have to invest in information technology infrastructure, purchase hardware, or buy software licences, the benefits are low up-front costs, rapid return on investment, rapid deployment, customization, flexible use, and solutions that can make use of new innovations. In addition, cloud providers that have specialized in a particular area (such as e-mail) can bring advanced services that a single company might not be able to afford or develop. Some other benefits to users include scalability, reliability, and efficiency. Scalability means that cloud computing offers unlimited processing and storage capacity. The cloud is reliable in that it enables access to applications and documents anywhere in the world via the Internet. Cloud computing is often considered efficient because it allows organizations to free up resources to focus on innovation and product development. Another potential benefit is that personal information may be better protected in the cloud. Specifically, cloud computing may improve efforts to build privacy protection into technology from the start and the use of better security mechanisms. Cloud computing will enable more flexible IT acquisition and improvements, which may permit adjustments to procedures based on the sensitivity of the data. Widespread use of the cloud may also encourage open standards for cloud computing that will establish baseline data security features common across different services and

providers. Cloud computing may also allow for better audit trails. In addition, information in the cloud is not as easily lost (when compared to the paper documents or hard drives, for example). Potential privacy risks while there are benefits, there are privacy and security concerns too. Data is travelling over the Internet and is stored in remote locations. In addition, cloud providers often serve multiple customers simultaneously. All of this may raise the scale of exposure to possible breaches, both accidental and deliberate. Concerns have been raised by many that cloud computing may lead to function creep uses of data by cloud providers that were not anticipated when the information was originally collected and for which consent has typically not been obtained. Given how inexpensive it is to keep data, there is little incentive to remove the information from the cloud and more reasons to find other things to do with it. Security issues, the need to segregate data when dealing with providers that serve multiple customers, potential secondary uses of the data these are areas that organizations should keep in mind when considering a cloud provider and when negotiating contracts or reviewing terms of service with a cloud provider. Given that the organization transferring this information to the provider is ultimately accountable for its protection, it needs to ensure that the personal information is appropriate handled. [3] The Best Cloud Computing Companies And CEOs To Work For In 2014 How to Data Stored in Cloud The data are stored using the Big Data Technology. Big Data: Big data which admittedly means many things to many people is no longer confined to the realm of technology. Today it is a business priority, given its ability to profoundly affect commerce in the globally integrated economy. In addition to providing solutions to long-standing business challenges, big data inspires new ways to transform processes, organizations, entire industries and even society itself. In our paper we described what is meant by big data and its types dramatically. We also described how to build a platform for big data. In this paper we also suggested two current running solutions with big data i.e., Big Data Strategy and Oracle Big Data Solutions. The big data strategy provides a plan for implementing big data in a right manner and provides opportunities for

Australian government agencies and for future work too. Whereas Oracle Big Data Solutions uses Hadoop and NoSQL Database technologies for implementing Big Data. What is Apache Hadoop? Large scale, open source software framework Yahoo! has been the largest contributor to date Dedicated to scalable, distributed, data-intensive computing Handles thousands of nodes and petabytes of data Supports applications under a free license. 3 Hadoop subprojects: Hadoop Common: common utilities package HFDS: Hadoop Distributed File System with high throughput access to application data MapReduce: A software framework for distributed processing of large data sets on computer clusters[4] Hadoop MapReduce MapReduce is a programming model and software framework first developed by Google (Google s MapReduce paper submitted in 2004) Intended to facilitate and simplify the processing of vast amounts of data in parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner Petabytes of data Thousands of nodes Computational processing occurs on both: Unstructured data : filesystem Structured data : database[4]

blocks across machines in a large cluster Reliability and fault tolerance ensured by replicating data across multiple hosts Has data awareness between nodes Designed to be deployed on low-cost hardware. Assumptions and Goals Hardware Failure Streaming Data Access Large Data Sets Moving Computation is Cheaper than Moving Data. Hadoop Distributed File System (HFDS) More on Hadoop file systems Hadoop can work directly with any distributed file system which can be mounted by the underlying OS However, doing this means a loss of locality as Hadoop needs to know which servers are closest to the data. Hadoop-specific file systems like HFDS are developed for locality, speed, fault tolerance, integration with Hadoop, and reliability.[5] What are Hadoop/MapReduce limitations? Inspired by Google File System Scalable, distributed, portable filesystem written in Java for Hadoop framework Primary distributed storage used by Hadoop applications HFDS can be part of a Hadoop cluster or can be a stand-alone general purpose distributed file system An HFDS cluster primarily consists of NameNode that manages file system metadata DataNode that stores actual data Stores very large files in Cannot control the order in which the maps or reductions are run For maximum parallelism, you need Maps and Reduces to not depend on data generated in the same MapReduce job (i.e. stateless) A database with an index will always be faster than a MapReduce job on unindexed data. Reduce operations do not take place until all Maps are complete (or have failed then been skipped) General assumption that the output of Reduce is smaller than the input to Map;

large data source used to generate smaller final values.[5] Who s using it? Lots of companies! Yahoo!, AOL, ebay, Facebook, IBM, Last.fm, LinkedIn, The New York Times, Ning, Twitter, and more In 2007 IBM and Google announced an initiative to use Hadoop to support university courses in distributed computer programming In 2008 this collaboration and the Academic Cloud Computing Initiative were funded by the NSF and produced the Cluster Exploratory Program (CLuE).[6][7] Conclusion As we disguised in this paper it shows that how Cloud Computing provides us in various aspects. Analyzing new and diverse digital data streams can reveal new sources of economic value, provide fresh insights into customer behaviour and identify market trends early on. But this influx of new data creates challenges for IT departments. To derive real business value from big data, you need the right tools to capture and organize a wide variety of data types from different sources, and to be able to easily analyze it within the context of all your enterprise data..hadoop is a large scale, open source software framework dedicated to scalable, distributed, data-intensive computing. The framework breaks up large data into smaller parallelizable chunks and handles scheduling Maps each piece to an intermediate value Reduces intermediate values to a solution User-specified partition and combiner options Fault tolerant, reliable, and supports thousands of nodes and petabytes of data If you can rewrite algorithms into Maps and Reduces, and your problem can be broken up into small pieces solvable in parallel, then Hadoop s MapReduce is the way to go for a distributed problem solving approach to large datasets. REFERENCES: [1]www.priv.gc.ca [2]www.cse.buffalo.edu/~bina/CloudComp utingjun28. [3] Hadoop Distributed Filesystem. http://hadoop.apache.org. [4]http://www.forbes.com/sites/louiscolumb us/2014/02/24/the-best-cloud-computingcompanies-and-ceos-to-work-for-in-2014/ [5]http://www.javacodegeeks.com/2012/05/ mapreduce-for-dummies.html [6]HDFS Java API: http://hadoop.apache.org/core/docs/current/a pi/ [7]HDFS source code: http://hadoop.apache.org/core/version_contr ol.html