Performance Analysis of Book Recommendation System on Hadoop Platform

Size: px
Start display at page:

Download "Performance Analysis of Book Recommendation System on Hadoop Platform"

Transcription

1 Performance Analysis of Book Recommendation System on Hadoop Platform Sugandha Bhatia #1, Surbhi Sehgal #2, Seema Sharma #3 Department of Computer Science & Engineering, Amity School of Engineering & Technology, Amity University, Noida, India. Abstract- Recommendation engines are computational intensive and hence ideal for Hadoop Platform. This research paper aims at building a book recommendation engine which uses data mining for recommending books. It will give its users the ability to upload and download engineering books as well as novels which will be used to draw out conclusions about the stream of a user and the genre of the books liked by that user. It will analyze the user behavior by making use of content based filtering and will apply association rules in data mining for displaying individualized recommendation system of books. This database will then be transferred to the Hadoop HDFS so that a comparison between the read and write efficiency between the local system and Hadoop can be made. Keywords - Association rules, Collaborative filtering, Content based filtering, Data mining, HDFS, Hadoop, Map Reduce, Recommendation engine. 1. INTRODUCTION Hadoop is a large scale distributed batch processing infrastructure. It allows for the distributed processing of large data sets called big data across clusters of computers using Map Reduce and HDFS file system.hadoop can be used with data mining applications. It makes possible the storage of unstructured data i.e. images in a structured form by making use if it s file system and other projects that work with Hadoop like HBase, Hive, and Pig etc. In this book recommendation engine the focus has been on performance analysis of the application on the local system and then compare it with the performance of the recommendation system when it is migrated to Hadoop. Hadoop provides a mechanism to analyze the performance by making use of its inbuilt plug-in so no additional resource is required. In the book recommendation engine, books will be displayed according to the readers preferences in a hierarchical way to categorize readers interest in different genres, the users pattern of downloading different engineering books and to form an effective set of rules based on that. New books will be appropriately presented according to users needs. Based on users interest and books properties, a book recommendation system will be created. Furthermore, it gives the facility to upload and download the books and save the books in the wish list if the book is unavailable or user wants to download it later. Star rating of each book is provided along with which is based on the concept of data mining. User is allowed to rate the book while downloading it or adding to his wish list. The whole dataset is then transferred to the Hadoop HDFS so that a comparison between local and Hadoop system can be made and the use Hadoop in the recommendation system can be justified 2. UNDERSTANDING HADOOP The Apache Hadoop software library is a framework that allows for the processing of big data across clusters of computers using simple programming models in a distributed manner. It can scale from a single server to thousands of machines, each of which offers a local processing and storage. Rather than relying on hardware components to deliver high availability the Apache Hadoop software library is designed to detect and handle failures at the application layer so it delivers a highly available service on top of a cluster of computers each of which may be prone to failures. [1] Hadoop makes it possible to run applications on systems with thousands of nodes involving terabytes of data. Its distributed file system called HDFS facilitates rapid data transfer rates among nodes and allows the system to continue to work interrupted in case of node failure. When a record has to be searched in terabytes of data, it will take a lot of time to retrieve the record but using Hadoop one can work in parallel and make the computation as well as searching faster. Hadoop framework can be used for log analysis, marketing analytics, data mining, image processing, web crawling etc. 2.1 HDFS IN HADOOP HDFS is a distributed, scalable, and portable file system which is written in JAVA for the Hadoop framework. The only source of input to the Hadoop framework is through its HDFS. HDFS stores large files of approximately 64MB each. It achieves reliability by replicating the data among multiple hosts. The NameNode is the central part of an HDFS file system. It keeps the directory tree of all the files in the file system and tracks the presence of file data across the cluster. Hadoop file system includes the secondary name node which connects with the primary name node to build snapshots of the primary namenode s directory. The DataNode is responsible for storing data in the HDFS. The job tracker schedules the map or reduces jobs to task trackers with an awareness of the data location. Similarly, the task tracker accepts task Map, Reduce and shuffle from JobTracker. On start-up, a DataNode connects to the NameNode. It then responds to requests from the NameNode for file system ISSN: Page 1037

2 operations. Client applications can talk directly to a DataNode once the location of data has been provided by the Name Node. The JobTracker then schedules the Map Reduce jobs to the TaskTracker. MapReduce operations allocated to the Task Tracker instances near a DataNode that talk directly to the DataNode to access the files. TaskTracker instances can be deployed on the same servers that host DataNode instances so that MapReduce operations are performed close to the data. Fig. 2.1 HDFS operations 2.2 MAP REDUCE IN HADOOP Hadoop makes use of Map Reduce programming model for processing large datasets and distributed computing on clusters of computers. Map Reduce take the advantage of locality of data for processing data on or near the storage assets to decrease transmission of data. It is a two step process namely Map and Reduce. In the Map process, the master node takes the input from the HDFS file system and divides it into smaller sub-problems and distributes them to worker nodes. The worker node may divide it again and process the smaller problem, and pass the answer back to its master node. Mapping process is performed by Mappers using key-value pair. In the Reduce process, the master node collects the solution to all the sub-problems and combines them in some way to form the solution to the original undivided problem. Reduction is performed by Reducers. The only mode of intercommunication between the nodes is during the shuffling process. matching it with user profile to give a future recommendation. The amount of data that is there in the Book recommendation system is a big problem. To display search results from a million users and to choose similar interest of a million user adds to latency in the search and degrades the performance of the Recommendation system. As recommendation systems are computational intensive, a vast amount of data needs to be searched when search results have to be displayed. It becomes difficult to store such a large amount of data. Furthermore, storing books in a relational form is a big problem as books and images are an unstructured form of data. Books and images needs to be stored as links in the relational database and retrieving results from links adds to more latency in the recommendation system. Book Recommendation Systems help users in managing their reading list by learning their preferences. There are two categories of such system, one which gives a list of recommendation based on user profile in a library automation system and others which tells user specifically what should he read next according to the current requirements. There are recommendation systems like whichbook.net, what should I read next, lazy library, library think etc., each of which uses a particular strategy to fulfill current requirements of the user. Current recommendation methods mainly use two approaches: collaborative filtering method [7] and content-based method [8]. The collaborative filtering method is based on how other customers have rated a book on average. Thus, the collaborative filtering method can be used to recommend books without effective book information, if sufficient user feedbacks (ratings) are available. However, this approach is not suitable for recommending unpopular books that readers do not rate. On the other hand, the content-based method recommends books highly evaluated by the recommendation systems using a user s feedback. The content-based method is useful for finding unpopular books when other users feedback is unavailable but effective book information is available. For example, in case a user has a book about Algorithm and Data Structures and the book is rated by him, the content-based recommendation can be applied. However, this approach may not be suitable for finding new category books because this method depends on each user s interests. These two approaches are better choice for promoting book purchase, but they are not always suitable for book reuse. [3] Fig. 2.1 Map Reduce mechanism 3. CURRENT RECOMMENDATION SYSTEMS The Book Recommendation Systems implemented so far have either used similar user interest or item characteristic for 4 PROPOSED SYSTEM Book recommendation system has been developed rapidly due to the Web technology, which provides a new way to acquire the reader s demands. However, existing recommendation systems don t analyze the recommendation information and can't supply enough information for readers to decide whether to recommend a book or not. Some systems also lack a feedback mechanism for readers which would not satisfy the needs of the readers. In order to solve these problems, a user s interest based book recommendation system has been proposed. The recommendation pages will contain all the ISSN: Page 1038

3 essential and expanding book information for readers to refer to. Readers can rate the book of their choice, and the star rating data from different user with similar patterns will be analyzed by the recommendation system to make scientific recommendation decision. The application of the recommendation system shows that both the recommended book utilization and readers' satisfaction were greatly increased. Fig.4.1: Proposed Model of book recommendation service The data input to the book recommendation system is the engineering books as well as novels. The books input to the book recommendation system through the database which is maintained in database. Data preprocessing describes any type of processing performed on raw data to prepare it for another processing procedure. Hadoop performs pre processing by automatically clustering the whole data into clusters depending upon the kind of data. Association rules are if/then statements that help uncover relationships between seemingly unrelated data in a relational database or other information repository. Recommendation is based on the user selection. If a customer selects an engineering book of a particular stream then he will be appropriately recommended other engineering books of the same domain. Example: In case a customer selects Digital image processing book by Gonzalez woods then the customer will be appropriately presented other books of Digital image processing by the other authors. Similarly, if a person selects a novel to download, then the customer will be recommended books by the same author and books of the same genres. Finally, the performance of the local system is compared with that on the Hadoop platform. The book recommendation system will be using single node Hadoop cluster to store data in HDFS. Hadoop will allow the distributed processing of large data sets across clusters of computers using Map Reduce and HDFS file system. In this book recommendation engine the focus will be on performance analysis of the application on the local system and then compare it with the performance of the recommendation system when data is read or written from HDFS. A near approximation of Big Data is desired for best results. The larger the data the better will be the performance of the recommendation system on Hadoop.Hadoop provides a mechanism to analyze the performance by entering the URL of the NameNode and JobTracker dashboard in the web browser. The NameNode runs of the port and the JobTracker runs on the port number by default.in the book recommendation engine, books will be displayed according to the readers preferences in a hierarchical way to categorize readers interest in different genres, the users pattern of downloading different engineering books and to form an effective set of rules based on that. New books will be appropriately presented and classified in order to satisfy the needs of readers. Furthermore, it gives the facility to upload and download the books and save the books in the wish list if the book is unavailable or user wants to download it later. Star rating of each book is provided along with which is based on the concept of data mining. User is allowed to rate the book while downloading it or adding to his wish list. Appropriate recommendations will be presented simultaneously.some of the proposed features of the Book Recommendation System are 1) Performance Analysis of the application on Hadoop 2)Easy management of the unstructured data by using Hadoop 3)Cluster formation implicitly done by Hadoop 4)Content based filtering 5) Advance Book Recommendation 6) Customized web pages 7)Explicit data collection by asking a user to rate a book and create a list of books that he likes 8)Implicit data collection by observing the books that user view and by keeping the record of the books that user download 9)Analyzing performance of the application on local machine and on Hadoop platform.10) Ability to upload and download a book. 4.1 MODEL OF BOOK RECOMMENDATION SYSTEM User Interface User Information Book Information Data Mining Profiles Database Antecedent Of rule Recommendation Module Subsequent of rule Book Database Subsequent Of rule Book Information Fig. 4.1: Model of Book Recommendation System The model of book recommendation service is made up of data mining, rule database, user interface, rule matching, and book database. Fig. 4.1 shows the methodology adopted in the paper. Firstly the user gives input to the book recommendation system which is in the form of either user details or the name of the book with or without author name. The input given by the user is then processed by the book recommendation system module and the desired books are fetched from the database and displayed to the user. Now the user can download the book of their choice after rating the book. Once ISSN: Page 1039

4 the user gives his star rating, then the association rules are implemented based on the user rating to display him other related books of the same author or other user with similar rating patterns. The association rules produced by data mining module are used to analyze user s interest. The data mining technology is applied on the profiles database where the star rating and downloading patterns of all the users are stored. Recommendation module finds the appropriate match from the books database based on the analysis of the profile database according to user s information, and then output some book recommendation information by searching in book database according to the subsequent of selective rule, which may of user interest. Recommendations are also made on the basis of the same author and same domain of the book selected by the user. At the backend the user s data in the database is analyzed in order have better understanding of the user s interests. Every time when the new user rates a particular book, its average rating is calculated and stored in the database. So our book recommendation system uses the combination of both content base and collaborative filtering to have better understanding of user s interest DATABASE DESIGN The database design of application is as below. Table: Books_Details and Novel Book_id Auto Number Gives the Book Id of the book Book_Name Text Gives the name of the book along with author Name Text Gives the book name Author Text Gives the book author Stream Text Gives the book stream Domain Text Gives the book domain Star_Rating Number Gives the star rating of the book Book_image Text Gives the book image Book_pdf Text Gives the book pdf Table 4.1: Books_Details and Novel Table: User_Details Username Text Gives the username Password Text Gives the password _address Text Gives the id of user birth_date Text Gives the birth date of user Table 4.2: User_Details Table: Profiles User_Id Text Gives the User Id of user Downloads Text Gives the books downloaded by user Uploads Text Gives the books uploaded by user List Text Gives the list of books in MyList Star_Rating Text Gives the start rating of book Table 4.3 Then the dataset is uploaded to the Hadoop HDFS by converting the data in the datasheet into the CSV format and then uploading it to the HDFS. 5. EXPERIMENTS AND ANALYSIS To validate the performance of the proposed Book Recommendation System and analysis based on Hadoop, this paper implements the platform and design several scenarios for testing. The scenarios conclude data upload, data query, and data statistics. The data upload is operated on HDFS, while the data statistics are observed by the Job Tracker and NameNode dashboard which runs in the web browser. This paper simulates a Book Recommendation System on Hadoop platform. The platform maintains multiple items pertaining to users like customer name, customer password, book downloaded, book uploaded, star rating etc. The data is stored in database. Database includes more than 2000 of books and the images corresponding to the books. A database of 1000 customers has been made This paper deploys single node Hadoop cluster with CPU of Intel core i5, 2.93GHz, memory of 2 MB and hard drive of 500G. The OS is Windows 7 (64bit), the version of Hadoop is hadoop with a hadoop plugin for eclipse Europa 3.3. As Hadoop runs only on Linux systems so Cygwin has been used which provides a Linux like environment and runs within windows.the tomcat is apache-tomcat , and the Java is jdk 1.6.0_17. Meantime, MsAccess is chosen as reference RDBMS. Same operation is conducted in database on identical environment to make a comparison. Counter Map Reduce Total HDFS 1.83 MB MB bytes read HDFS MB 1.57 MB bytes written Local bytes MB 1.93 MB read Local bytes 1.95 MB 1.93 MB 3.88 MB written Fig. 5.1: Map Reduce results ISSN: Page 1040

5 Fig. 5.1 gives a comparison of running an application on Hadoop and the local system. When the application is run on the Local system the number of bytes requires to read is 1.93 MB while when read operation is performed from HDFS system it is 1.83 MB. There is a difference of ( = 0.1) MB which is approximately 5 %. This is when there is a data of around 4 MB. If the data increases the performance increases further. Similarly, the number of bytes required to write in case of local system is 3.88 MB while the number of bytes required to write in case of HDFS is 1.57 MB. There is a difference of ( =2.31) MB. The efficiency improves by %. This is the case when the system has around 4 MB of data. If this data increase the efficiency increases further. This is the reason for using Hadoop platform for Big Data. Book Recommendation System can be an apt application which has Big Data as there can be millions of users and millions of users thus demanding the use of Hadoop to improve efficiency. 6. CONCLUSION This recommendation system can meet the book demands of readers to the greatest extent; meanwhile it can recommend books to the user based on his interest. It has applied the association rules in data mining for book downloads, and designed an individualized recommendation system on book download and upload and rating patterns of the user. Simulation results show that this mining of data of books rating by the user to predict his interest has good results. All data about books and users are analyzed to identify patterns through data mining. After related user group, having similarity with target user, is classified by collaborative filtering. MapReduce and Hadoop are expected to be useful for data intensive applications. However, there is a limited understanding of the trade-off of different design choices in Hadoop application such as file system, streaming mode, networking, replication etc. Future works include bringing a near approximation of Big Data by increasing the number of books as well as the users so that the whole system has a better performance and the abilities of Hadoop can be put to the best of use. Hadoop works best with application whose data is put on the cloud because it s very difficult to keep and maintain petabytes of data on a hard disk.the next goal will be to transfer the recommendation system to cloud. REFERENCES [1] Hadoop. < 2009> [2]HDFS architecture.< docs/current/hdfsdesign.html, 2009> [3]Kuroiwa Takanori ; Bhalla Subhash, Dynamic Personalization for Book Recommendation System Using Web Services and Virtual Library Enhancements, Computer &Information Technology, 2007, 7th IEEE International Conference on Computer and Information Technology, Publication Year: 2007, Page(s): [4] T. White, Hadoop: The Definitive Guide, O Reilly Media, Yahoo! Press, June 5, [5] Konstantin Shvachko ; Hairong Kuang ;Sanjay Radia; Robert Chansler ; The Hadoop distributed File System, Yahoo! Sunnyvale, California USA, /10/$ IEEE, [6]Jin Shan ; Fan Chunfeng ; Meng Yun ; Wu Xiaohai ; Chen Qingzhang, The design of a new book auto recommendation system based on readers' interest, 2011 International Conference on Electrical and Control Engineering (ICECE), Publication Year: 2011, Page(s): [7]G.Linden;B.Smith; J.York. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Computing, pages 76 80, Jan/Feb [8]R.J. Mooney, L. Roy. Content based book recommending using learning for text categorization, Proceedings of the 5th ACM conference on Digital Libraries, pages , [9]Zhag,Guang-Qian ; Sun,Wei, User Preferences to attributes of books for personalized recommendation, 2012 IEEE 3rd International Conference on Software Engineering and Service Science (ICSESS),Publication Year: 2012, Page(s): [10]Maurya,M. ; Mahajan,S., Performance analysis of Map Reduce programs on Hadoop cluster,2012 World Congress on Information and Communication Technologies (WICT), Publication Year: 2012, Page(s): [11] Xu, Weijia ; Luo, Wei ; Woodward, Nicholas, Analysis and Optimization of Data Import with Hadoop, 2012 IEEE 26th International Conference on Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), Publication Year: 2012, Page(s): [12]Hadoop Overview. < ISSN: Page 1041

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,maheshkmaurya@yahoo.co.in

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

THE HADOOP DISTRIBUTED FILE SYSTEM

THE HADOOP DISTRIBUTED FILE SYSTEM THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability

More information

Processing of Hadoop using Highly Available NameNode

Processing of Hadoop using Highly Available NameNode Processing of Hadoop using Highly Available NameNode 1 Akash Deshpande, 2 Shrikant Badwaik, 3 Sailee Nalawade, 4 Anjali Bote, 5 Prof. S. P. Kosbatwar Department of computer Engineering Smt. Kashibai Navale

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

Storage and Retrieval of Data for Smart City using Hadoop

Storage and Retrieval of Data for Smart City using Hadoop Storage and Retrieval of Data for Smart City using Hadoop Ravi Gehlot Department of Computer Science Poornima Institute of Engineering and Technology Jaipur, India Abstract Smart cities are equipped with

More information

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,

More information

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce. An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce. Amrit Pal Stdt, Dept of Computer Engineering and Application, National Institute

More information

Hadoop Scheduler w i t h Deadline Constraint

Hadoop Scheduler w i t h Deadline Constraint Hadoop Scheduler w i t h Deadline Constraint Geetha J 1, N UdayBhaskar 2, P ChennaReddy 3,Neha Sniha 4 1,4 Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore,

More information

Chase Wu New Jersey Ins0tute of Technology

Chase Wu New Jersey Ins0tute of Technology CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at

More information

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,

More information

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

marlabs driving digital agility WHITEPAPER Big Data and Hadoop marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil

More information

Mining of Web Server Logs in a Distributed Cluster Using Big Data Technologies

Mining of Web Server Logs in a Distributed Cluster Using Big Data Technologies Mining of Web Server Logs in a Distributed Cluster Using Big Data Technologies Savitha K Dept. of Computer Science, Research Scholar PSGR Krishnammal College for Women Coimbatore, India. Vijaya MS Dept.

More information

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

A very short Intro to Hadoop

A very short Intro to Hadoop 4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A COMPREHENSIVE VIEW OF HADOOP ER. AMRINDER KAUR Assistant Professor, Department

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris

More information

Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics

Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics Dharmendra Agawane 1, Rohit Pawar 2, Pavankumar Purohit 3, Gangadhar Agre 4 Guide: Prof. P B Jawade 2

More information

Internals of Hadoop Application Framework and Distributed File System

Internals of Hadoop Application Framework and Distributed File System International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop

More information

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)

More information

METHOD OF A MULTIMEDIA TRANSCODING FOR MULTIPLE MAPREDUCE JOBS IN CLOUD COMPUTING ENVIRONMENT

METHOD OF A MULTIMEDIA TRANSCODING FOR MULTIPLE MAPREDUCE JOBS IN CLOUD COMPUTING ENVIRONMENT METHOD OF A MULTIMEDIA TRANSCODING FOR MULTIPLE MAPREDUCE JOBS IN CLOUD COMPUTING ENVIRONMENT 1 SEUNGHO HAN, 2 MYOUNGJIN KIM, 3 YUN CUI, 4 SEUNGHYUN SEO, 5 SEUNGBUM SEO, 6 HANKU LEE 1,2,3,4,5 Department

More information

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea Overview Riding Google App Engine Taming Hadoop Summary Riding

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Open source large scale distributed data management with Google s MapReduce and Bigtable

Open source large scale distributed data management with Google s MapReduce and Bigtable Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory

More information

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.

More information

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of

More information

An Efficient Analysis of Web Server Log Files for Session Identification using Hadoop Mapreduce

An Efficient Analysis of Web Server Log Files for Session Identification using Hadoop Mapreduce Proc. of Int. Conf. on Advances in Communication, Network, and Computing, CNC An Efficient Analysis of Web Server Log Files for Session Identification using Hadoop Mapreduce Savitha K 1 and Vijaya MS 2

More information

IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM

IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM Sugandha Agarwal 1, Pragya Jain 2 1,2 Department of Computer Science & Engineering ASET, Amity University, Noida,

More information

The Hadoop Distributed File System

The Hadoop Distributed File System The Hadoop Distributed File System The Hadoop Distributed File System, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo, 2010 Agenda Topic 1: Introduction Topic 2: Architecture

More information

A Study of Data Management Technology for Handling Big Data

A Study of Data Management Technology for Handling Big Data Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 9, September 2014,

More information

The Recovery System for Hadoop Cluster

The Recovery System for Hadoop Cluster The Recovery System for Hadoop Cluster Prof. Priya Deshpande Dept. of Information Technology MIT College of engineering Pune, India priyardeshpande@gmail.com Darshan Bora Dept. of Information Technology

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration

MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration Hoi-Wan Chan 1, Min Xu 2, Chung-Pan Tang 1, Patrick P. C. Lee 1 & Tsz-Yeung Wong 1, 1 Department of Computer Science

More information

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster Integrating SAP BusinessObjects with Hadoop Using a multi-node Hadoop Cluster May 17, 2013 SAP BO HADOOP INTEGRATION Contents 1. Installing a Single Node Hadoop Server... 2 2. Configuring a Multi-Node

More information

Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com.

Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. A Big Data Hadoop Architecture for Online Analysis. Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. Ramlal Naik L Acme Tele Power LTD Haryana, India ramlalnaik@gmail.com. Abstract Big Data

More information

Introduction to Hadoop

Introduction to Hadoop Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction

More information

Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster

Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster Verification and Validation of MapReduce Program model for Parallel K-Means algorithm on Hadoop Cluster Amresh Kumar Department of Computer Science & Engineering, Christ University Faculty of Engineering

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

Data-Intensive Computing with Map-Reduce and Hadoop

Data-Intensive Computing with Map-Reduce and Hadoop Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion

More information

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE Anjali P P 1 and Binu A 2 1 Department of Information Technology, Rajagiri School of Engineering and Technology, Kochi. M G University, Kerala

More information

H2O on Hadoop. September 30, 2014. www.0xdata.com

H2O on Hadoop. September 30, 2014. www.0xdata.com H2O on Hadoop September 30, 2014 www.0xdata.com H2O on Hadoop Introduction H2O is the open source math & machine learning engine for big data that brings distribution and parallelism to powerful algorithms

More information

How To Analyze Network Traffic With Mapreduce On A Microsoft Server On A Linux Computer (Ahem) On A Network (Netflow) On An Ubuntu Server On An Ipad Or Ipad (Netflower) On Your Computer

How To Analyze Network Traffic With Mapreduce On A Microsoft Server On A Linux Computer (Ahem) On A Network (Netflow) On An Ubuntu Server On An Ipad Or Ipad (Netflower) On Your Computer A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pig and Typical Mapreduce Anjali P P and Binu A Department of Information Technology, Rajagiri School of Engineering and Technology,

More information

DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVIRONMENT

DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVIRONMENT DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVIRONMENT Gita Shah 1, Annappa 2 and K. C. Shet 3 1,2,3 Department of Computer Science & Engineering, National Institute of Technology,

More information

Log Mining Based on Hadoop s Map and Reduce Technique

Log Mining Based on Hadoop s Map and Reduce Technique Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

Survey on Scheduling Algorithm in MapReduce Framework

Survey on Scheduling Algorithm in MapReduce Framework Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India

More information

White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP

White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP White Paper Big Data and Hadoop Abhishek S, Java COE www.marlabs.com Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP Table of contents Abstract.. 1 Introduction. 2 What is Big

More information

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759

More information

Reduction of Data at Namenode in HDFS using harballing Technique

Reduction of Data at Namenode in HDFS using harballing Technique Reduction of Data at Namenode in HDFS using harballing Technique Vaibhav Gopal Korat, Kumar Swamy Pamu vgkorat@gmail.com swamy.uncis@gmail.com Abstract HDFS stands for the Hadoop Distributed File System.

More information

Enhancing MapReduce Functionality for Optimizing Workloads on Data Centers

Enhancing MapReduce Functionality for Optimizing Workloads on Data Centers Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,

More information

Energy-Saving Cloud Computing Platform Based On Micro-Embedded System

Energy-Saving Cloud Computing Platform Based On Micro-Embedded System Energy-Saving Cloud Computing Platform Based On Micro-Embedded System Wen-Hsu HSIEH *, San-Peng KAO **, Kuang-Hung TAN **, Jiann-Liang CHEN ** * Department of Computer and Communication, De Lin Institute

More information

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data

More information

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop Source Alessandro Rezzani, Big Data - Architettura, tecnologie e metodi per l utilizzo di grandi basi di dati, Apogeo Education, ottobre 2013 wikipedia Hadoop Apache Hadoop is an open-source software

More information

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research

More information

Intro to Map/Reduce a.k.a. Hadoop

Intro to Map/Reduce a.k.a. Hadoop Intro to Map/Reduce a.k.a. Hadoop Based on: Mining of Massive Datasets by Ra jaraman and Ullman, Cambridge University Press, 2011 Data Mining for the masses by North, Global Text Project, 2012 Slides by

More information

Design of Electric Energy Acquisition System on Hadoop

Design of Electric Energy Acquisition System on Hadoop , pp.47-54 http://dx.doi.org/10.14257/ijgdc.2015.8.5.04 Design of Electric Energy Acquisition System on Hadoop Yi Wu 1 and Jianjun Zhou 2 1 School of Information Science and Technology, Heilongjiang University

More information

International Journal of Innovative Research in Computer and Communication Engineering

International Journal of Innovative Research in Computer and Communication Engineering FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,

More information

Highly Available Hadoop Name Node Architecture-Using Replicas of Name Node with Time Synchronization among Replicas

Highly Available Hadoop Name Node Architecture-Using Replicas of Name Node with Time Synchronization among Replicas IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 3, Ver. II (May-Jun. 2014), PP 58-62 Highly Available Hadoop Name Node Architecture-Using Replicas

More information

Detection of Distributed Denial of Service Attack with Hadoop on Live Network

Detection of Distributed Denial of Service Attack with Hadoop on Live Network Detection of Distributed Denial of Service Attack with Hadoop on Live Network Suchita Korad 1, Shubhada Kadam 2, Prajakta Deore 3, Madhuri Jadhav 4, Prof.Rahul Patil 5 Students, Dept. of Computer, PCCOE,

More information

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT

More information

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop

More information

Manifest for Big Data Pig, Hive & Jaql

Manifest for Big Data Pig, Hive & Jaql Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,

More information

L1: Introduction to Hadoop

L1: Introduction to Hadoop L1: Introduction to Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 1, 2014 Today we are going to learn... 1 General

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis , 22-24 October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

HadoopRDF : A Scalable RDF Data Analysis System

HadoopRDF : A Scalable RDF Data Analysis System HadoopRDF : A Scalable RDF Data Analysis System Yuan Tian 1, Jinhang DU 1, Haofen Wang 1, Yuan Ni 2, and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China {tian,dujh,whfcarter}@apex.sjtu.edu.cn

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software 22 nd October 2013 10:00 Sesión B - DB2 LUW 1 Agenda Big Data The Technical Challenges Architecture of Hadoop

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Big Data Analytics by Using Hadoop

Big Data Analytics by Using Hadoop Governors State University OPUS Open Portal to University Scholarship All Capstone Projects Student Capstone Projects Spring 2015 Big Data Analytics by Using Hadoop Chaitanya Arava Governors State University

More information

E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms

E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and Big Data

More information

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL *Hung-Ming Chen, Chuan-Chien Hou, and Tsung-Hsi Lin Department of Construction Engineering National Taiwan University

More information

Advances in Natural and Applied Sciences

Advances in Natural and Applied Sciences AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/anas Clustering Algorithm Based On Hadoop for Big Data 1 Jayalatchumy D. and

More information

Big Data Analytics for Net Flow Analysis in Distributed Environment using Hadoop

Big Data Analytics for Net Flow Analysis in Distributed Environment using Hadoop Big Data Analytics for Net Flow Analysis in Distributed Environment using Hadoop 1 Amreesh kumar patel, 2 D.S. Bhilare, 3 Sushil buriya, 4 Satyendra singh yadav School of computer science & IT, DAVV, Indore,

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Click Stream Data Analysis Using Hadoop

Click Stream Data Analysis Using Hadoop Governors State University OPUS Open Portal to University Scholarship Capstone Projects Spring 2015 Click Stream Data Analysis Using Hadoop Krishna Chand Reddy Gaddam Governors State University Sivakrishna

More information