A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM
|
|
- Gervase Payne
- 8 years ago
- Views:
Transcription
1 A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM Ramesh Maharjan and Manoj Shakya Department of Computer Science and Engineering Dhulikhel, Kavre, Nepal Abstract Cloud computing, data and distributed systems are three important aspects of this paper. Cloud computing is being embraced by every organization and is being implemented in every field of work, be it in business or in education. Data storage and processing is fundamental task of any organization. Hadoop is a distributed framework created to handle the big data processing task. The aim of this paper is to study and analyze different aspects such as performance, flexibility, scalability and more on Hadoop clusters in the cloud and in commodity. Introduction Cloud computing is an abstract term describing the use of resources, which don t belong to the user to perform required task and then disconnect from the resources not in use. Buyya et al. [1] have defined it as follows: Cloud is a parallel and distributed computing system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources based on service-level agreements (SLA) established through negotiation between the service provider and consumers. Vaquero et al. [2] have stated clouds are a large pool of easily usable and accessible virtualized resources (such as hardware, development platforms and/or services). These resources can be dynamically reconfigured to adjust to a variable load (scale), allowing also for an optimum resource utilization. This pool of resources is typically exploited by a pay-per-use model in which guarantees are offered by the Infrastructure Provider by means of customized Service Level Agreements. Cloud computing [3] technology started very early when mainframe computers were available in academia and corporations, accessible through clients computers to be shared. Cloud computing [5] is based on different computing research areas such as HPC, virtualization, utility computing, grid computing, networking, security and many others. Depending on service providers cloud computing can be broadly divided into: [4] Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service(IaaS). SaaS provides both resources and applications as a Service to the clients. PaaS provides a level above SaaS by enabling clients to access to platforms that they need to develop. And IaaS provides the storage and computing infrastructure to the clients over the internet. Based on technology used cloud computing can be subcategorized into: [4] Public, Private, Community and Hybrid clouds. A public cloud is accessible with an internet connection. A private cloud is limited to an organization or group. A community cloud is a private cloud shared between groups or organizations. And hybrid cloud is a mixture of at least two clouds explained above. There are many platforms available to set up a cloud. Openstack was chosen due to following reasons: First of all, it is open-source, meaning it is open to pick and mix any hardware needs, open to design own networks, open to use any virtualization technology, open to other needed features and so on. Secondly, it has the largest group of developers and contributors. Thirdly it is simple to configure and use. Data is an inseparable entity of any organization. According to [6] Every day, we create 2.5 quintillion bytes of data. The obvious cause is the advancement of technology and its use. The piled up data needs to be stored,
2 processed and analyzed to get useful information. There are many technical solutions to it but Hadoop is chosen due to following reasons: Hadoop makes data mining, analytics, and processing of big data cheap and fast. Hadoop is an open source project and is made to deal with terabytes of data in minutes. Hadoop is the only way that companies with gigantic amounts of data like Facebook, Twitter, Yahoo, ebay and Amazon can cost-effectively and quickly make decisions. Hadoop is easily scalable as hard drives and nodes can be added without need of shutting down Hadoop. Hadoop stores and processes any kind of data. Hadoop is natively written in Java but can be accessed using other languages such as SQL-inspired language (Hive), c/c++, python and many more. Knowledge on Hadoop is a must to understand the paper. The Apache Hadoop software library is a framework that allows the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. It provides for building distributed system for data storage, data analysis, and coordination. A framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System (GFS) and of Map-Reduce. Hadoop employs a master/slave architecture for both distributed storage and distributed computation. This paper is divided into five parts, Introduction, cloud, cluster, analysis and finally conclusion. The cloud section explains in brief about Openstack cloud. The cluster section explains the clustering of Hadoop system in real computers and in the cloud and also explains running of applications on the distributed systems. The analysis section describes the comparison of two system with appropriate graphs. The cloud According to the official Openstack starter guide [7] Cloud computing is a computing model, where resources such as computing power, storage, network and software are abstracted and provided as services on the Internet in a remotely accessible fashion.. An infrastructure setup using the cloud computing model is generally referred to as the "cloud". The guide further explains that Openstack is a collection of open source software projects that enterprises/service providers can use to setup and run their cloud compute and storage infrastructure. The installation of cloud (Openstack) consists of five projects: nova-compute, swift, glance, keystone and Horizon which is more clear with the following figures. Fig 1 simple Openstack architecture Fig 2 simple Openstack workflow
3 There are many reasons behind choosing Openstack. The first reason is the flexibility in Openstack regarding networking and hardware. Open-source is the second reason with millions of developers and contributors. Thirdly Openstack has become phenomenon meaning that big companies like Dell, AMD, Cisco, HP and others alongside Rackspace are using it. Furthermore, linux heavyweights like Red Hat and Ubuntu are implementing Openstack. Lastly but not the least, it is simple and flexible to configure and use. The clusters After the successful installation and configuration of Openstack cloud, virtual servers were created in the cloud. For this experiment, under Hadoop project, four instances were created, one acting as master/slave and the rest three as slaves. Four-node cluster was created for the Hadoop jobs to be run and analysed. The same configuration was applied to personal computers as well. The cluster in the cloud was made identical to the cluster in the real systems. The details of the clusters are as given in the table below. Table 1: details of the clusters Servers cloud vs real Details (personal computers) Details (virtual servers in cloud) Master vs kucse-dcg 2 GB Ram 2 VCPU 160 GB storage 2 GB Ram 2 VCPU 80 GB storage Slave1 vs user 2 GB Ram 2 VCPU 80 GB storage 2 GB Ram 2 VCPU 80 GB storage Slave2 vs user1 2 GB Ram 2 VCPU 80 GB storage 2 GB Ram 2 VCPU 80 GB storage Slave3 vs user3 2 GB Ram 2 VCPU 80 GB storage 2 GB Ram 2 VCPU 80 GB storage Screenshot of the dashboard of the cloud is given below. Fig 3: dashboard of Kathmandu University cloud with Hadoop project
4 After the successful configuration of the clusters, three jobs: two jobs to convert image files to pdf files and one job of word count were run on both systems. The first two jobs were based on image and pdf files being serialized in map reduce framework [9][10][11], and the last job was available in the Hadoop package itself. The results were not so much contradictory which are summarized in the tables given below. Table 2: summary of first job Cluster in personal computers Cluster in virtual servers in cloud inputs 23 folder 94 image files 169 MB 23 folder 94 image files 169 MB outputs 23 folder 94 pdf files 90.1 MB 1 folder 94 pdf files 90.1 MB Time taken 3 minutes and 8 seconds 1 minute and 31 seconds Table 3: summary of second job Cluster in personal computers Cluster in virtual servers in cloud inputs 1 folder 476 image files 926 MB 1 folder 476 image files 926 MB outputs 1 pdf files MB 1 pdf files MB Time taken 7 minutes and 51 seconds 9 minutes and 22 seconds Table 4: summary of third job Cluster in personal computers Cluster in virtual servers in cloud inputs 1 text file 1.1 GB 1 text file 1.1 GB outputs 1 text file with counts KB 1 text file with counts KB Time taken 4 minutes and 0 second 5 minutes and 1 second Three graphs generated from the above tables are given below. Graph 1 time taken for the first job Graph 2 time taken for the second job Graph 3 time taken for the third job
5 Analysis The Hadoop distributed system set up on personal computers was certain to be more efficient and faster than the cloud system. The obvious reasons were that Hadoop framework is developed with commodity machines in mind and that the processing is done in real hardware without any resources sharing as compared to cloud systems. The first job was contradictory with the points discussed above and with other two jobs. The reason is that the job has to recursively read and write files, thus has to cache all the bytes read and due to hardware inefficiency in the personal computers used (especially memory), the job has to be canceled from the inefficient machines (task trackers). Thus the job was a bit slow in real computers due to cancellation of jobs and transferring the job to other tasktrackers (nodes/machines). To sum up, the personal computers Hadoop cluster was as powerful and fault tolerant as the cloud Hadoop cluster but not as scalable and flexible as the cloud cluster. The reasons were that the real systems are of real hardware and can not be made available easily but the cloud cluster was of virtualized system so could be varied according to needs and availability. Creating a new virtual server (node) in the cloud could be done in seconds but adding a new node in the real system had to account the availability of computers. A problem in virtual server can be solved by terminating and creating a new one but the same could not be applied in real system. The real system were connected with network cables so had a little but insignificant delay in communication between nodes in real system while the nodes were inside the same project which made the communication between nodes much faster. Most important aspect of the cloud is that server creation and termination is extraordinarily easy i.e. the cluster can be scaled up and down with great ease. Another aspect of the cloud is that the cluster can be made public without any difficulty as we can see in the above screenshot that the master server s web interface is accessed through browser in the client. Conclusion This paper is an analysis of running a Hadoop cluster in cloud and in real system and identifying the best solution by running simple Hadoop jobs in the configured clusters. This paper concludes that running a Hadoop cluster in cloud for data storage and analysis is more flexible and easily-scalable than the real system cluster. This paper also concludes the cluster in real system computers are faster than the cloud clusters. But due to different advantageous features of the cloud computing system such as quick termination of servers (nodes) if problems arise and creation of the node from the same state the machine was terminated, automatic networking, instant creation of nodes and cluster and many such features cloud Hadoop cluster would be more favorable. References [1] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility, Future Generation Computer Systems, 25:599_616, [2] L. M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, A break in the clouds: Towards a cloud definition, SIGCOMM Computer Communications Review, 39:50_55, [3] Cloud Computing on Wikipedia, en.wikipedia.org/wiki/cloudcomputing [4] [5] Cloud Computing: Principles and Paradigms, Edited by Rajkumar Buyya, James Broberg and Andrzej Goscinski
6 [6] [7] Openstack compute starter guide, May (essex) [8] [9] Tom White, Hadoop: The Definitive Guide, O Reilly Media, 2009 [10] Chuck Lam, Hadoop in Action, MEAP Unedited Draft, 2010 [11] Bruno Lowagie, itext in Action, 2010
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationOpen Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)
Open Cloud System (Integration of Eucalyptus, Hadoop and into deployment of University Private Cloud) Thinn Thu Naing University of Computer Studies, Yangon 25 th October 2011 Open Cloud System University
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationApache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationEXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics
BIG DATA WITH HADOOP EXPERIMENTATION HARRISON CARRANZA Marist College APARICIO CARRANZA NYC College of Technology CUNY ECC Conference 2016 Poughkeepsie, NY, June 12-14, 2016 Marist College AGENDA Contents
More informationSistemi Operativi e Reti. Cloud Computing
1 Sistemi Operativi e Reti Cloud Computing Facoltà di Scienze Matematiche Fisiche e Naturali Corso di Laurea Magistrale in Informatica Osvaldo Gervasi ogervasi@computer.org 2 Introduction Technologies
More informationGrid Computing Vs. Cloud Computing
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 577-582 International Research Publications House http://www. irphouse.com /ijict.htm Grid
More informationData-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion
More informationL1: Introduction to Hadoop
L1: Introduction to Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 1, 2014 Today we are going to learn... 1 General
More informationOpen source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
More informationCisco Integration Platform
Data Sheet Cisco Integration Platform The Cisco Integration Platform fuels new business agility and innovation by linking data and services from any application - inside the enterprise and out. Product
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationApplication Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
More informationMobile Cloud Computing T-110.5121 Open Source IaaS
Mobile Cloud Computing T-110.5121 Open Source IaaS Tommi Mäkelä, Otaniemi Evolution Mainframe Centralized computation and storage, thin clients Dedicated hardware, software, experienced staff High capital
More informationFinding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics
Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics Dharmendra Agawane 1, Rohit Pawar 2, Pavankumar Purohit 3, Gangadhar Agre 4 Guide: Prof. P B Jawade 2
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationCSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
More informationHadoop on OpenStack Cloud. Dmitry Mescheryakov Software Engineer, @MirantisIT
Hadoop on OpenStack Cloud Dmitry Mescheryakov Software Engineer, @MirantisIT Agenda OpenStack Sahara Demo Hadoop Performance on Cloud Conclusion OpenStack Open source cloud computing platform 17,209 commits
More informationAnalysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms
Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of
More informationMapReduce, Hadoop and Amazon AWS
MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables
More informationBig Data and Cloud Computing for GHRSST
Big Data and Cloud Computing for GHRSST Jean-Francois Piollé (jfpiolle@ifremer.fr) Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer Facing data deluge
More informationCloud Computing Backgrounder
Cloud Computing Backgrounder No surprise: information technology (IT) is huge. Huge costs, huge number of buzz words, huge amount of jargon, and a huge competitive advantage for those who can effectively
More informationCLOUD COMPUTING. When It's smarter to rent than to buy
CLOUD COMPUTING When It's smarter to rent than to buy Is it new concept? Nothing new In 1990 s, WWW itself Grid Technologies- Scientific applications Online banking websites More convenience Not to visit
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services
More informationNoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationBIG DATA SOLUTION DATA SHEET
BIG DATA SOLUTION DATA SHEET Highlight. DATA SHEET HGrid247 BIG DATA SOLUTION Exploring your BIG DATA, get some deeper insight. It is possible! Another approach to access your BIG DATA with the latest
More informationAccelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
More informationCloud Computing : Concepts, Types and Research Methodology
Cloud Computing : Concepts, Types and Research Methodology S. Muthulakshmi Bangalore,Karnataka India- 560068 Abstract: Cloud -computing is a very popular term in this modern and computer world in IT solution
More informationScalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
More informationComparing Ganeti to other Private Cloud Platforms. Lance Albertson Director lance@osuosl.org @ramereth
Comparing Ganeti to other Private Cloud Platforms Lance Albertson Director lance@osuosl.org @ramereth About me OSU Open Source Lab Server hosting for Open Source Projects Open Source development projects
More informationKeywords: Cloudsim, MIPS, Gridlet, Virtual machine, Data center, Simulation, SaaS, PaaS, IaaS, VM. Introduction
Vol. 3 Issue 1, January-2014, pp: (1-5), Impact Factor: 1.252, Available online at: www.erpublications.com Performance evaluation of cloud application with constant data center configuration and variable
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationFrom Wikipedia, the free encyclopedia
Page 1 sur 5 Hadoop From Wikipedia, the free encyclopedia Apache Hadoop is a free Java software framework that supports data intensive distributed applications. [1] It enables applications to work with
More informationEnhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
More informationA Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
More informationAnalysing Large Web Log Files in a Hadoop Distributed Cluster Environment
Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,
More informationCertified Cloud Computing Professional VS-1067
Certified Cloud Computing Professional VS-1067 Certified Cloud Computing Professional Certification Code VS-1067 Vskills Cloud Computing Professional assesses the candidate for a company s cloud computing
More informationComparing Open Source Private Cloud (IaaS) Platforms
Comparing Open Source Private Cloud (IaaS) Platforms Lance Albertson OSU Open Source Lab Associate Director of Operations lance@osuosl.org / @ramereth About me OSU Open Source Lab Server hosting for Open
More informationSriram Krishnan, Ph.D. sriram@sdsc.edu
Sriram Krishnan, Ph.D. sriram@sdsc.edu (Re-)Introduction to cloud computing Introduction to the MapReduce and Hadoop Distributed File System Programming model Examples of MapReduce Where/how to run MapReduce
More informationBig Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013
Big Data Use Case How Rackspace is using Private Cloud for Big Data Bryan Thompson May 8th, 2013 Our Big Data Problem Consolidate all monitoring data for reporting and analytical purposes. Every device
More informationRole of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
More informationBIG DATA USING HADOOP
+ Breakaway Session By Johnson Iyilade, Ph.D. University of Saskatchewan, Canada 23-July, 2015 BIG DATA USING HADOOP + Outline n Framing the Problem Hadoop Solves n Meet Hadoop n Storage with HDFS n Data
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationViswanath Nandigam Sriram Krishnan Chaitan Baru
Viswanath Nandigam Sriram Krishnan Chaitan Baru Traditional Database Implementations for large-scale spatial data Data Partitioning Spatial Extensions Pros and Cons Cloud Computing Introduction Relevance
More informationThe Client Side of Cloud Computing
Cloud Clients Service Look-Up Resumé Literature SE aus Informatik, SS 2009 26. Mai 2009 Cloud Clients Service Look-Up Resumé Literature 1 Cloud Clients Definition Hardware Clients Software Clients Software
More informationAGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
More informationDATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
More informationA Study on Analysis and Implementation of a Cloud Computing Framework for Multimedia Convergence Services
A Study on Analysis and Implementation of a Cloud Computing Framework for Multimedia Convergence Services Ronnie D. Caytiles and Byungjoo Park * Department of Multimedia Engineering, Hannam University
More informationCloud computing: utility computing over the Internet
Cloud computing: utility computing over the Internet Taneli Korri Helsinki University of Technology tkorri@hut.fi Abstract Cloud computing has become a hot topic in the IT industry, as it allows people
More informationWeekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay
Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability
More informationHow To Understand Cloud Computing
Overview of Cloud Computing (ENCS 691K Chapter 1) Roch Glitho, PhD Associate Professor and Canada Research Chair My URL - http://users.encs.concordia.ca/~glitho/ Overview of Cloud Computing Towards a definition
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationBig Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani
Big Data and Hadoop Sreedhar C, Dr. D. Kavitha, K. Asha Rani Abstract Big data has become a buzzword in the recent years. Big data is used to describe a massive volume of both structured and unstructured
More informationJeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationCloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad
Cloud Computing: Computing as a Service Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad Abstract: Computing as a utility. is a dream that dates from the beginning from the computer
More informationA Survey on Cloud Computing
A Survey on Cloud Computing Poulami dalapati* Department of Computer Science Birla Institute of Technology, Mesra Ranchi, India dalapati89@gmail.com G. Sahoo Department of Information Technology Birla
More informationPLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS
PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris
More informationCloud computing: the state of the art and challenges. Jānis Kampars Riga Technical University
Cloud computing: the state of the art and challenges Jānis Kampars Riga Technical University Presentation structure Enabling technologies Cloud computing defined Dealing with load in cloud computing Service
More informationHow To Understand Cloud Computing
Dr Markus Hagenbuchner markus@uow.edu.au CSCI319 Introduction to Cloud Computing CSCI319 Chapter 1 Page: 1 of 10 Content and Objectives 1. Introduce to cloud computing 2. Develop and understanding to how
More informationTowards Comparative Evaluation of Cloud Services
Towards Comparative Evaluation of Cloud Services Farrukh Nadeem Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia. Abstract:
More informationBig Data Analytics: Hadoop-Map Reduce & NoSQL Databases
Big Data Analytics: Hadoop-Map Reduce & NoSQL Databases Abinav Pothuganti Computer Science and Engineering, CBIT,Hyderabad, Telangana, India Abstract Today, we are surrounded by data like oxygen. The exponential
More informationOpen source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationW H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More informationLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT Samira Daneshyar 1 and Majid Razmjoo 2 1,2 School of Computer Science, Centre of Software Technology and Management (SOFTEM),
More informationHadoop Parallel Data Processing
MapReduce and Implementation Hadoop Parallel Data Processing Kai Shen A programming interface (two stage Map and Reduce) and system support such that: the interface is easy to program, and suitable for
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationCloud Computing using
Cloud Computing using Summary of Content Introduction of Cloud Computing Cloud Computing vs. Server Virtualization Cloud Computing Components Stack Public vs. Private Clouds Open Source Software for Private
More informationCloud computing - Architecting in the cloud
Cloud computing - Architecting in the cloud anna.ruokonen@tut.fi 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices
More informationMap Reduce & Hadoop Recommended Text:
Big Data Map Reduce & Hadoop Recommended Text:! Large datasets are becoming more common The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately
More informationHow To Compare Cloud Computing To Cloud Platforms And Cloud Computing
Volume 3, Issue 11, November 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Cloud Platforms
More informationReducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan
Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan Abstract Big Data is revolutionizing 21st-century with increasingly huge amounts of data to store and be
More informationHeterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing
Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing Deep Mann ME (Software Engineering) Computer Science and Engineering Department Thapar University Patiala-147004
More informationHadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
More informationBig Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
More informationHadoop Distributed File System. Jordan Prosch, Matt Kipps
Hadoop Distributed File System Jordan Prosch, Matt Kipps Outline - Background - Architecture - Comments & Suggestions Background What is HDFS? Part of Apache Hadoop - distributed storage What is Hadoop?
More informationIntroduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
More informationA Cost-Evaluation of MapReduce Applications in the Cloud
1/23 A Cost-Evaluation of MapReduce Applications in the Cloud Diana Moise, Alexandra Carpen-Amarie Gabriel Antoniu, Luc Bougé KerData team 2/23 1 MapReduce applications - case study 2 3 4 5 3/23 MapReduce
More informationBig Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
More informationA Middleware Strategy to Survive Compute Peak Loads in Cloud
A Middleware Strategy to Survive Compute Peak Loads in Cloud Sasko Ristov Ss. Cyril and Methodius University Faculty of Information Sciences and Computer Engineering Skopje, Macedonia Email: sashko.ristov@finki.ukim.mk
More informationA Gentle Introduction to Cloud Computing
A Gentle Introduction to Cloud Computing Source: Wikipedia Platform Computing, Inc. Platform Clusters, Grids, Clouds, Whatever Computing The leader in managing large scale shared environments o 18 years
More informationCloud on TEIN Part I: OpenStack Cloud Deployment. Vasinee Siripoonya Electronic Government Agency of Thailand Kasidit Chanchio Thammasat University
Cloud on TEIN Part I: OpenStack Cloud Deployment Vasinee Siripoonya Electronic Government Agency of Thailand Kasidit Chanchio Thammasat University Outline Objectives Part I: OpenStack Overview How OpenStack
More informationHPC ABDS: The Case for an Integrating Apache Big Data Stack
HPC ABDS: The Case for an Integrating Apache Big Data Stack with HPC 1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox gcf@indiana.edu http://www.infomall.org
More informationVirtual Machine Instance Scheduling in IaaS Clouds
Virtual Machine Instance Scheduling in IaaS Clouds Naylor G. Bachiega, Henrique P. Martins, Roberta Spolon, Marcos A. Cavenaghi Departamento de Ciência da Computação UNESP - Univ Estadual Paulista Bauru,
More informationSEAIP 2009 Presentation
SEAIP 2009 Presentation By David Tan Chair of Yahoo! Hadoop SIG, 2008-2009,Singapore EXCO Member of SGF SIG Imperial College (UK), Institute of Fluid Science (Japan) & Chicago BOOTH GSB (USA) Alumni Email:
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More informationChapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
More informationOpenStack IaaS. Rhys Oxenham OSEC.pl BarCamp, Warsaw, Poland November 2013
OpenStack IaaS 1 Rhys Oxenham OSEC.pl BarCamp, Warsaw, Poland November 2013 Disclaimer The information provided within this presentation is for educational purposes only and was prepared for a community
More information