HADOOP INTO CLOUD: A RESEARCH
|
|
- Russell Ralph Summers
- 8 years ago
- Views:
Transcription
1 HADOOP INTO CLOUD: A RESEARCH Prof.M.S.Malkar H.O.D, PCP, Pune msmalkar@rediffmail.com Ms.Misha Ann Alexander M.E.Student, LecturerPCP,Pune annalexander33@gmail.com Abstract- Hadoop and cloud has gained a lot of popularity in the IT industry.hadoop was invented by google which can handle a large amount of data i.e. either structured or unstructured data called BIG DATA.Hadoop spreads these data into number of clusters which makes it possible to handle large amount of data. The data processing is done in individual clusters and the results are given in a single set. Cloud computing is defined as scalable heterogeneous resources which is provided as services to us across Internet with needed bandwidth, security, computing capacity and reliability. As a part of green computing the IT industries are trying to reduce energy, cost invested on new infrastructure, software license etc. So cloud computing has gained a lot of popularity as the industries use these services provided as per their requirement and pay per use. This paper provides a detailed research on how cloud servers can be efficiently utilized as Hadoop clusters for storing and processing large amount of data. I.INTRODUCTION We are in a decade of digital data known as BIG DATA Data is growing with time and is the heart of any business, scientific search, search engines etc and hence is a very important factor. Nowadays we interact with terabytes, zetabytes or petabytes of data. By using traditional RDBMS we may not be successful to get desired output. This problem can be basically solved by HADOOP. Hadoop makes an efficient use of resource by dividing the data and storing it in separate clusters and processing it efficiently [3]. Big data are very large to operate and manage, to manage such huge data large servers are required [2].Cloud computing provides large infrastructures for storing data and services. This paper illustrates what is Hadoop, cloud computing and how is it possible to combine these two existing technologies to increase the efficiency of data processing and resource utilization. The remainder of the paper is structured as follows. Section II describes HADOOP, Section III describes cloud computing, Section IV describes HADOOP into cloud 74
2 And section V concludes the paper. II. HADOOP A. Why Hadoop evolved Due to the growing size of data it becomes very difficult to process the data using traditional RDBMS. Relational Data Base Management systems evolved in the year 1970 and are yet being used. But RDBMS is facing a problem of Unit of Analysis whose symptoms are: - 1. Counting of distinct records becomes very difficult if the data is very large. 2. Rdbms also causes Alter Table of Doom i.e. if the table is large and has number of columns then DBMS takes a lot of time to alter the table and if it has Not NULL constraint then it takes more time. 3. Rdbms cannot step row by row in the cursors when the data is very large. 4. Data Merge and Mash up is caused as many business Entrepreneurs online and offline data so the pattern Or structure of these data are different if we want to Have an analytical model then Rdbms cannot Combine structured and unstructured data. These Problems can be solved by Hadoop [3]. B. HADOOP Hadoop is basically a set of complex software s and has more than 200 tunable parameters. Hadoop Framework can run applications on very large clusters of computers. Hadoop is a distributed system that stores and analyzes large amount of data stored at various clusters in order to get meaningful information. It is highly scalable and a fault tolerant system [4]. B. Hadoop Cluster A cluster is a group of independent computer systems, referred to as nodes, working together as a unified computing resource. C.Hadoop Data Model Hadoop does not have a particular data format but it uses various file formats such as text files to data base files which is stored across multiple clusters. To increase the Hadoop processing power you just have to add more servers to Hadoop cluster. Hadoop is faster than other data processing systems as in RDBMS to process that data from multiple sites it is necessary to first convert that data into a rigid tabular format and perform data analysis [6]. The software divides the data into pieces and stores across clusters, it also keeps a track of where the data resides as there are multiple copies of replicated data. Hadoop runs on number of machines that does not share memory or disks hence it is cheaper than distributed data base systems. D. Hadoop Components i. Map Reduce Model A data is processed in two phases by the Map Reduce framework. The HDFS splits the data set into independent chunks which are processed in a parallel manner.. 75
3 Fig1:- Hadoop data model The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. The Map Reduce framework consists of a single master Job Tracker and one slave Task Tracker per cluster-node. When a particular job is submitted the job tracker schedules all jobs and handles it.a task tracker is the manager for all tasks on a given node. A task is executed on an individual map or reducer [5]. ii. Hadoop Distributed File System (HDFS) A file system controls storage and retrieval of data. It organizes the data in the storage area, names the stored Information, and keeps a track of the beginning and end of every record. Hadoop file system is different from other file system which is designed to be deployed on low cost hardware. HDFS have large terabytes or gigabytes of data, it is also a fault tolerant system. HDFS consist of several servers that stores each of the file system [2]. Fig2:-HDFS Architecture HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode[7]. III. CLOUD COMPUTING Cloud computing is a new trend where we can access both the hardware and software across the Internet from outside our environment. This service has to be paid as per our use. 76
4 This platform hides the complexity from the user.eg:- (where our data is stored on cloud and we can access it anywhere anytime.) Cloud helps to avail services without investing in expensive Infrastructure. Public cloud is basically operated by service provider or cloud provider over the Internet. Private cloud is owned by a particular organization or a third party. Community cloud is operated by number of organizations and services provided to only some group.these services can be owned by a private organization or cloud service provider. Hybrid computing is a combination of private and community cloud. A. Services provided by cloud In cloud computing services is being reused by several users across the network.cloud provides scalability,multitenancy which allow users to access system on different hardware devices.eg gmail,crm.user can access these services by paying as per use. i. Software as Service(SaaS) A complete application is offered as a service on demand. A single instance of the s/w runs on the cloud and services multiple end-users or clients organization. Eg gmail,crm.user can access these services by paying for it as per use. ii. Platform as Service (PaaS) It offeres development platform over cloud. Encapsulates a layer of software and provides it as a service that can be used to build higher services. Produces platform by integrating on OS, middleware, Application s/w and even develop an environment. iii. Infrastructure as Service(IaaS) Infrastructure as service delivers basic storage and computing capabilities as standardized services over network. involves offering hardware related services using the principles of cloud computing. These could include some kind of storage services (database or disk storage) or virtual servers. iv. Cloud Application Program Interface The cloud infrastructure is programmable[18]. IV. HADOOP INTO CLOUD There are many cloud providers like Amazon,Microsoft,Google that provide enterprise with cloud,the cloud is nothing but array of servers where your huge data can be stored.the main intension of these enterprises is data analysis for which the data from the cloud has to be given as input to another data processing system.hadoop is a data processing system which needs clusters of servers for processing the BIGDATA, this can be easily provided by the cloud. So we can try to collaborate the two existing technologies to provide the same. This is motivating many of the cloud providers to deploy Hadoop into cloud. Hadoop is written in Java, which allows programmers to write Java API or Python code. It provides the application programmer with Map and Reduce functions. The MapReduce capability is available in many languages, such as Java and Python[8]. A. Challenges of Deploying Cloud into Hadoop 1. Hadoop was orginally architected for the world of big iron but cloud infrastructure depend upon virtualization to manage and present an aggregation of infrastructure components. 2. Hadoop data nodes(servers) are installed in racks, Rack contains 77
5 multiple data servers with a top rack switch which is used in data communication.there should be rack awareness as hadoop has to write data into 3(default) data nodes which are on different physical rack.this data is replicated across multiple racks as it prevents data loss due to data node or rack faliure. 3. Performance of a hadoop cluster may be good with dedicated hardware but the agility of running it in cloud on demand may trump some limitations for some workload. By considering all the above challenges we have to find various ways to deploy hadoop on cloud. B. Deploying hadoop on cloud. Racksack private cloud computing helps quickly implement hadoop on cloud using open source software. OPENSTACK is a open cloud standard which helps to build both private and public cloud.private cloud is basically reserved for your data alone while public cloud contains lots of data hence very crucial and needs more security. APACHE HADOOP processes large amount of structured and unstructured data. Apache projects such as Hive,Pig, HBase etc provide tools to manipulate data. Hadoop was orginally architected for the static predictable infrastructure but virtualization is benifical.[11] C. Cloud creation and Hadoop Installation Rackspace private cloud software (RPCS) is a free and open source software that can be utilized to launch a cloud powered by OpenStack. RPCS provides the same cloud platform that powers Rackspace s public cloud, the largest open cloud deployment in the world[11]. Steps to Install Hadoop 1. You should have 3-4 systems, on a network and each of them running on Ubuntu with sun-java6-jdk and ssh packages installed. Its preferable to use a new system installation, though its not mandatory. 2. Create a new user for Hadoop work. This step is optional. Its recommened, as the path HADOOP_HOME is the same in the cluster. 3. Extract the hadoop distribution on your home folder. 4. Now, repeat the steps in all the nodes. (Make sure the HADOOP_HOME) is the same on all the nodes. 5. We need the IPs of all the 3 nodes. Let them be: , , Where *.1.5 is the NameNode, *.1.6 is the JobTracker, these 2 are the main exclusive servers. Node *.1.7 is the DataNode, which is used for both Task Tracking and storing Data. 6. You will find a file called: "hadoopsite.xml" under the conf directory of the Hadoop distribution. Copy and paste the following contents between <configuration> </configuration> <property> <name>fs.default.name</name> <!-- IP Of the NameNode --> <value>hdfs:// :9090</value > <description></description> </property> <property> <name>mapred.job.tracker</name> <!-- IP of the JobTracker --> <value> :9050</value> <description></description> </property><property> </property> 7. Make sure the same is done for all the 78 nodes in the system.
6 8. Now, to create the Slaves for the NameNode to replicate the data. Go thehadoop_home directory in the NameNode. Under the folder "conf" you should see a file called slaves. 9. Upon opening slaves, you should see a line with "localhost". Add the IPs of all the DataNodes you wish to connect to the cluster here, one per line and start our cluster. 10. Open terminal in the NameNode, go to HADOOP_HOME. 11. Execute the following commands: # Format the HDFS in the namenode $ bin/hadoop namenode -format # Start the Distributed File System service on the NameNode, which will ask you the passwords for itself and All the slaves, to connect via SSH $ bin/start-dfs.sh 12. Your NameNode should start and be running. 13. Now, its the JobTracker Node. Execute the following commands: # Start the Map/Reduce service on the JobTracker $ bin/start-mapred.sh 14. The same process follows for JobTracker. It asks all the password for itself and all its 15. Now, that we're done starting the cluster. Its time to check it out.in the NameNode execute the following command: # Copy a folder (conf) to HDFS - For sample purpose $ bin/hadoop fs -put conf input 16. If you go to on your browser. U should see the Hadoop HDFS Admin interface. Its a simple interface created to meet the purpose. It shows you the Cluster Summary, Live and Dead Nodes etc. 17. You can browse the HDFS using link on the top-left corner. 18. Go, to to view the Hadoop Map/Reduce Admin Interface. It displays the current jobs, finished jobs etc. 19. Now check the Map/Reduce Process by Executing some codes.[12] V. CONCLUSION With increase adoption of cloud where data is already stored in cloud servers doing an analysis of data which already in cloud is cost efficient, hence deploying hadoop on cloud can reduce the cost of moving big data from one location to another. As we are in the era of green computing where every enterprise is cutting down cost deploying hadoop on cloud is a great way for the same. Many cloud providers now a days have started working on the same in order to provide customer with hadoop into cloud solution. Many solutions have been already found to run hadoop on cloud and there are many more solutions to come by the end of this year. VI. REFERENCES [1] James Tuner Hadoop Architecture and its data applications 12 jan 2011 [2] Nati Shaloms Blog 28 Aug 2012 [3] Rakesh Rao Big data platforms Blog. When to Use Hadoop Instead of a Relational Database Management System (RDBMS 1/10/13. [4] Miha Ahrozovitz,Kuldip Pable Hadoop. [5]Y.Elababakh Hadoop ramework Spring. [6] Brain Profit The real reason Hadoop is such a bigdeal in big data May [7] HDFS Architecture Guide 8 Apr [8] Dr Khalil.E Khalib, Aba Yagoub Hadoop@uoit [9] Eugene Ciurana, Apache Hadoop Deployment:A Blueprint for Reliable 79
7 Distributed Computing [10] Ven Varma Big Data and Analysis Hub g-hadoop-cloud. [11] Private Cloud team Apache Hadoop on Rackspace Private Cloud [12] Hadoop Cluster Deployment + Step- By-Step Process July p-cluster-deployment-step-by-step.html [13] Chen Zhang1, Hans De Sterck2, Ashraf Aboulnaga1, Haig Djambazian3, and Rob Sladek3 Case Study of Scienti_c Data Processing on a Cloud Using Hadoop [14] ] Dell Cloudera Solution Reference Architecture v2.1.0 A Dell Reference Architecture Guide November 2012 [15] Sun Microsystems Introduction to cloud computing architecture whitepaper 1 st edition June 2009 [16] Grant ingersoll, Chief Scientist for LucidWorks and ted Dunning, Chief Application Architect for Mapr Crowd Sourcing Reflected Intelligence Using Search and Big Data [17]Open Logic Top 10 Lessons Learned from Deploying Hadoop in a Private Cloud.ppt [18] pes.html 80
Apache Hadoop new way for the company to store and analyze big data
Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File
More informationTutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationWelcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
More informationGeneric Log Analyzer Using Hadoop Mapreduce Framework
Generic Log Analyzer Using Hadoop Mapreduce Framework Milind Bhandare 1, Prof. Kuntal Barua 2, Vikas Nagare 3, Dynaneshwar Ekhande 4, Rahul Pawar 5 1 M.Tech(Appeare), 2 Asst. Prof., LNCT, Indore 3 ME,
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More information!"#$%&' ( )%#*'+,'-#.//"0( !"#$"%&'()*$+()',!-+.'/', 4(5,67,!-+!"89,:*$;'0+$.<.,&0$'09,&)"/=+,!()<>'0, 3, Processing LARGE data sets
!"#$%&' ( Processing LARGE data sets )%#*'+,'-#.//"0( Framework for o! reliable o! scalable o! distributed computation of large data sets 4(5,67,!-+!"89,:*$;'0+$.
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationSetup Hadoop On Ubuntu Linux. ---Multi-Node Cluster
Setup Hadoop On Ubuntu Linux ---Multi-Node Cluster We have installed the JDK and Hadoop for you. The JAVA_HOME is /usr/lib/jvm/java/jdk1.6.0_22 The Hadoop home is /home/user/hadoop-0.20.2 1. Network Edit
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationApache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationReduction of Data at Namenode in HDFS using harballing Technique
Reduction of Data at Namenode in HDFS using harballing Technique Vaibhav Gopal Korat, Kumar Swamy Pamu vgkorat@gmail.com swamy.uncis@gmail.com Abstract HDFS stands for the Hadoop Distributed File System.
More informationFault Tolerance in Hadoop for Work Migration
1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous
More informationLecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015
Lecture 2 (08/31, 09/02, 09/09): Hadoop Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 K. Zhang BUDT 758 What we ll cover Overview Architecture o Hadoop
More informationParallel Data Mining and Assurance Service Model Using Hadoop in Cloud
Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud Aditya Jadhav, Mahesh Kukreja E-mail: aditya.jadhav27@gmail.com & mr_mahesh_in@yahoo.co.in Abstract : In the information industry,
More informationHadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science
A Seminar report On Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org
More informationNoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationCS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment
CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment James Devine December 15, 2008 Abstract Mapreduce has been a very successful computational technique that has
More informationTake An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationSurvey on Scheduling Algorithm in MapReduce Framework
Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India
More informationFinding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics
Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics Dharmendra Agawane 1, Rohit Pawar 2, Pavankumar Purohit 3, Gangadhar Agre 4 Guide: Prof. P B Jawade 2
More informationData-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationHadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationWeekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay
Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability
More informationContents. 1. Introduction
Summary Cloud computing has become one of the key words in the IT industry. The cloud represents the internet or an infrastructure for the communication between all components, providing and receiving
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationA Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
More informationR.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5
Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,
More informationA Database Hadoop Hybrid Approach of Big Data
A Database Hadoop Hybrid Approach of Big Data Rupali Y. Behare #1, Prof. S.S.Dandge #2 M.E. (Student), Department of CSE, Department, PRMIT&R, Badnera, SGB Amravati University, India 1. Assistant Professor,
More informationBig Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani
Big Data and Hadoop Sreedhar C, Dr. D. Kavitha, K. Asha Rani Abstract Big data has become a buzzword in the recent years. Big data is used to describe a massive volume of both structured and unstructured
More informationTP1: Getting Started with Hadoop
TP1: Getting Started with Hadoop Alexandru Costan MapReduce has emerged as a leading programming model for data-intensive computing. It was originally proposed by Google to simplify development of web
More informationVolume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
More informationApache Hadoop FileSystem and its Usage in Facebook
Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs
More informationA STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
More informationProcessing of Hadoop using Highly Available NameNode
Processing of Hadoop using Highly Available NameNode 1 Akash Deshpande, 2 Shrikant Badwaik, 3 Sailee Nalawade, 4 Anjali Bote, 5 Prof. S. P. Kosbatwar Department of computer Engineering Smt. Kashibai Navale
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationTHE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCE COMPARING HADOOPDB: A HYBRID OF DBMS AND MAPREDUCE TECHNOLOGIES WITH THE DBMS POSTGRESQL
THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCE COMPARING HADOOPDB: A HYBRID OF DBMS AND MAPREDUCE TECHNOLOGIES WITH THE DBMS POSTGRESQL By VANESSA CEDENO A Dissertation submitted to the Department
More informationRole of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
More informationHDFS Architecture Guide
by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5
More informationInternals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
More informationProact whitepaper on Big Data
Proact whitepaper on Big Data Summary Big Data is not a definite term. Even if it sounds like just another buzz word, it manifests some interesting opportunities for organisations with the skill, resources
More informationBig Data Storage Options for Hadoop Sam Fineberg, HP Storage
Sam Fineberg, HP Storage SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationHadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.
Hadoop Source Alessandro Rezzani, Big Data - Architettura, tecnologie e metodi per l utilizzo di grandi basi di dati, Apogeo Education, ottobre 2013 wikipedia Hadoop Apache Hadoop is an open-source software
More informationCriteria to Compare Cloud Computing with Current Database Technology
Criteria to Compare Cloud Computing with Current Database Technology Jean-Daniel Cryans, Alain April, and Alain Abran École de Technologie Supérieure, 1100 rue Notre-Dame Ouest Montréal, Québec, Canada
More informationInternational Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
More informationParallel Processing of cluster by Map Reduce
Parallel Processing of cluster by Map Reduce Abstract Madhavi Vaidya, Department of Computer Science Vivekanand College, Chembur, Mumbai vamadhavi04@yahoo.co.in MapReduce is a parallel programming model
More informationThe Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform
The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions
More informationIntroduction to MapReduce and Hadoop
Introduction to MapReduce and Hadoop Jie Tao Karlsruhe Institute of Technology jie.tao@kit.edu Die Kooperation von Why Map/Reduce? Massive data Can not be stored on a single machine Takes too long to process
More informationNetworking in the Hadoop Cluster
Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop
More informationHadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
More informationhttp://www.paper.edu.cn
5 10 15 20 25 30 35 A platform for massive railway information data storage # SHAN Xu 1, WANG Genying 1, LIU Lin 2** (1. Key Laboratory of Communication and Information Systems, Beijing Municipal Commission
More informationSingle Node Setup. Table of contents
Table of contents 1 Purpose... 2 2 Prerequisites...2 2.1 Supported Platforms...2 2.2 Required Software... 2 2.3 Installing Software...2 3 Download...2 4 Prepare to Start the Hadoop Cluster... 3 5 Standalone
More informationMobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:
More informationBig Data on Cloud Computing- Security Issues
Big Data on Cloud Computing- Security Issues K Subashini, K Srivaishnavi UG Student, Department of CSE, University College of Engineering, Kanchipuram, Tamilnadu, India ABSTRACT: Cloud computing is now
More informationTHE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
More informationmarlabs driving digital agility WHITEPAPER Big Data and Hadoop
marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil
More informationHadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers
More informationComparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS)
Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) White Paper BY DATASTAX CORPORATION August 2013 1 Table of Contents Abstract 3 Introduction 3 Overview of HDFS 4
More informationAnalysing Large Web Log Files in a Hadoop Distributed Cluster Environment
Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,
More informationChapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
More informationCloud Computing Architecture: A Survey
Cloud Computing Architecture: A Survey Abstract Now a day s Cloud computing is a complex and very rapidly evolving and emerging area that affects IT infrastructure, network services, data management and
More informationA very short Intro to Hadoop
4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,
More informationBigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
More informationHadoop@LaTech ATLAS Tier 3
Cerberus Hadoop Hadoop@LaTech ATLAS Tier 3 David Palma DOSAR Louisiana Tech University January 23, 2013 Cerberus Hadoop Outline 1 Introduction Cerberus Hadoop 2 Features Issues Conclusions 3 Cerberus Hadoop
More informationBig Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney
Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Understanding Big Data and Big Data Analytics Getting familiar with Hadoop Technology Hadoop release and upgrades
More informationCSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A COMPREHENSIVE VIEW OF HADOOP ER. AMRINDER KAUR Assistant Professor, Department
More informationOpen source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
More informationHadoop Distributed File System. Jordan Prosch, Matt Kipps
Hadoop Distributed File System Jordan Prosch, Matt Kipps Outline - Background - Architecture - Comments & Suggestions Background What is HDFS? Part of Apache Hadoop - distributed storage What is Hadoop?
More informationOnX Big Data Reference Architecture
OnX Big Data Reference Architecture Knowledge is Power when it comes to Business Strategy The business landscape of decision-making is converging during a period in which: > Data is considered by most
More informationMapReduce. Tushar B. Kute, http://tusharkute.com
MapReduce Tushar B. Kute, http://tusharkute.com What is MapReduce? MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity
More informationA PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM
A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM Ramesh Maharjan and Manoj Shakya Department of Computer Science and Engineering Dhulikhel, Kavre, Nepal lazymesh@gmail.com,
More informationHadoop. History and Introduction. Explained By Vaibhav Agarwal
Hadoop History and Introduction Explained By Vaibhav Agarwal Agenda Architecture HDFS Data Flow Map Reduce Data Flow Hadoop Versions History Hadoop version 2 Hadoop Architecture HADOOP (HDFS) Data Flow
More informationFrom Wikipedia, the free encyclopedia
Page 1 sur 5 Hadoop From Wikipedia, the free encyclopedia Apache Hadoop is a free Java software framework that supports data intensive distributed applications. [1] It enables applications to work with
More informationHadoop Distributed File System. Dhruba Borthakur June, 2007
Hadoop Distributed File System Dhruba Borthakur June, 2007 Goals of HDFS Very Large Distributed File System 10K nodes, 100 million files, 10 PB Assumes Commodity Hardware Files are replicated to handle
More informationIntroduction to HDFS. Prasanth Kothuri, CERN
Prasanth Kothuri, CERN 2 What s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand. HDFS is the primary distributed storage for Hadoop applications. HDFS
More informationData Analytics. CloudSuite1.0 Benchmark Suite Copyright (c) 2011, Parallel Systems Architecture Lab, EPFL. All rights reserved.
Data Analytics CloudSuite1.0 Benchmark Suite Copyright (c) 2011, Parallel Systems Architecture Lab, EPFL All rights reserved. The data analytics benchmark relies on using the Hadoop MapReduce framework
More informationAn Hadoop-based Platform for Massive Medical Data Storage
5 10 15 An Hadoop-based Platform for Massive Medical Data Storage WANG Heng * (School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876) Abstract:
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Qloud Demonstration 15 319, spring 2010 3 rd Lecture, Jan 19 th Suhail Rehman Time to check out the Qloud! Enough Talk! Time for some Action! Finally you can have your own
More informationMapReduce, Hadoop and Amazon AWS
MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables
More informationBeyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.
Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has
More informationKeywords: Big Data, HDFS, Map Reduce, Hadoop
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning
More informationWhite Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP
White Paper Big Data and Hadoop Abhishek S, Java COE www.marlabs.com Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP Table of contents Abstract.. 1 Introduction. 2 What is Big
More informationH2O on Hadoop. September 30, 2014. www.0xdata.com
H2O on Hadoop September 30, 2014 www.0xdata.com H2O on Hadoop Introduction H2O is the open source math & machine learning engine for big data that brings distribution and parallelism to powerful algorithms
More informationRecognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework
Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Vidya Dhondiba Jadhav, Harshada Jayant Nazirkar, Sneha Manik Idekar Dept. of Information Technology, JSPM s BSIOTR (W),
More informationProcessing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems
Processing of massive data: MapReduce 2. Hadoop 1 MapReduce Implementations Google were the first that applied MapReduce for big data analysis Their idea was introduced in their seminal paper MapReduce:
More informationIntroduction to HDFS. Prasanth Kothuri, CERN
Prasanth Kothuri, CERN 2 What s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand. HDFS is the primary distributed storage for Hadoop applications. Hadoop
More information