Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework

Similar documents
Processing of Hadoop using Highly Available NameNode

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Chapter 7. Using Hadoop Cluster and MapReduce

HIPI: A Hadoop Image Processing Interface for Image-based MapReduce Tasks

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

Hadoop Architecture. Part 1

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

CSE-E5430 Scalable Cloud Computing Lecture 2

Hadoop IST 734 SS CHUNG

Parallel Processing of cluster by Map Reduce

Data-Intensive Computing with Map-Reduce and Hadoop

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Detection of Distributed Denial of Service Attack with Hadoop on Live Network

GraySort and MinuteSort at Yahoo on Hadoop 0.23

Keywords: Big Data, HDFS, Map Reduce, Hadoop

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

International Journal of Advance Research in Computer Science and Management Studies

Fault Tolerance in Hadoop for Work Migration

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani

MapReduce and Hadoop Distributed File System

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Introduction to MapReduce and Hadoop

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

MAPREDUCE Programming Model

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Comparison of Different Implementation of Inverted Indexes in Hadoop

Introduction to Hadoop

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Open source Google-style large scale data analysis with Hadoop

NoSQL and Hadoop Technologies On Oracle Cloud

MapReduce. Tushar B. Kute,

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Distributed Framework for Data Mining As a Service on Private Cloud

Big Data and Apache Hadoop s MapReduce

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Log Mining Based on Hadoop s Map and Reduce Technique

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

Introduction to Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Generic Log Analyzer Using Hadoop Mapreduce Framework

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics


Distributed File Systems

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Processing Large Amounts of Images on Hadoop with OpenCV

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Lecture 10 - Functional programming: Hadoop and MapReduce

Image Search by MapReduce

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

ImprovedApproachestoHandleBigdatathroughHadoop

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

Distributed File Systems

Architectures for massive data management

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Certified Big Data and Apache Hadoop Developer VS-1221

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Hadoop implementation of MapReduce computational model. Ján Vaňo

LARGE SCALE SATELLITE IMAGE PROCESSING USING HADOOP DISTRIBUTED SYSTEM

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Reduction of Data at Namenode in HDFS using harballing Technique

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Comparative analysis of Google File System and Hadoop Distributed File System

The Hadoop Framework

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

THE HADOOP DISTRIBUTED FILE SYSTEM

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Enhancing MapReduce Functionality for Optimizing Workloads on Data Centers

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Open source large scale distributed data management with Google s MapReduce and Bigtable

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

Introduction to Cloud Computing

Hadoop & its Usage at Facebook

Hadoop Scheduler w i t h Deadline Constraint

Survey on Scheduling Algorithm in MapReduce Framework

Large scale processing using Hadoop. Ján Vaňo

Big Application Execution on Cloud using Hadoop Distributed File System

Hadoop/MapReduce. Object-oriented framework presentation CSCI 5448 Casey McTaggart

Transcription:

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Vidya Dhondiba Jadhav, Harshada Jayant Nazirkar, Sneha Manik Idekar Dept. of Information Technology, JSPM s BSIOTR (W), Wagholi, Pune, Maharastra. Guide Prof. P.A.Bandgar, Head, Dept. of Information Technology For correspondence: vidyadjadhav@gmail.com;harshada.nazirkar@gmail.com; smidekar@gmail.com Abstract: Today in the world of cloud and grid computing integration of data from heterogeneous databases is inevitable. This will become complex when size of the database is very large. M-R is a new framework specifically designed for processing huge datasets on distributed sources. Apache s Hadoop is an implementation of M-R. Currently Hadoop has been applied successfully for file based datasets. This project proposes to utilize the parallel and distributed processing capability of Hadoop M-R for handling Images on large datasets. The presented methodology of land-cover recognition provides a scalable solution for automatic satellite imagery analysis, especially when GIS data is not readily available, or surface change may occur due to catastrophic events such as flooding, hurricane, and snow storm, etc. Here, we are using algorithms such as Image Differentiation, Image Duplication, Zoom-In, Gray-Scale. Index Terms: Map-Reduce (M-R), HDFS(Hadoop Distributed File System),HIPI(Hadoop Image Processing Interface) I. INTRODUCTION Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters. Hadoop cluster is a set of commodity machines networked together in one location. Data storage and processing all occur within this cluster of machines. Different users can submit computing jobs to Hadoop from individual clients, which can be their own desktop machines in remote locations from the Hadoop cluster. A small Hadoop cluster includes a single master and multiple worker nodes. The master node consists of a Job Tracker, Task Tracker, NameNode and Data Node shown in Fig 1. Fig 1. A multi-node Hadoop cluster. A slave or worker node acts as both a Data Node and Task Tracker, though it is possible to have data-only worker nodes and compute-only worker nodes. In a larger cluster, the HDFS is managed through a dedicated Name Node server to host the file system index, and a secondary Name Node

that can generate snapshots of the name node s memory structures, thus preventing file-system corruption and reducing loss of data. Similarly, a standalone Job Tracker server can manage job scheduling. In clusters where the Hadoop Map Reduce engine is deployed against an alternate file system, the Name Node, secondary Name Node and Data Node architecture of HDFS is replaced by the file-system-specific equivalent. II. PROPOSED SYSTEM In a Hadoop cluster, data is distributed to all the nodes of the cluster as it is being loaded in. The HDFS will split large data files into chunks which are managed by different nodes in the cluster. In addition to this each chunk is replicated across several machines, so that a single machine failure does not result in any data being unavailable. An active monitoring system then re-replicates the data in response to system failures which can result in partial storage. Even though the file chunks are replicated and distributed across several machines, they form a single namespace, so their contents are universally accessible. IV. SYSTEM ARCHITECTURE The system architecture includes the following- 1.Large no. of images stored in file system.2.this Bundle of images is fed to hadoop distributed file system.3.on HDFS, we execute set of operations like duplicate image removal, zoom in and find differences among Images, using M-R Programs.4.The Result is then uploaded in web server, and shown to user through web application. V. ALGORITHMS USED Fig.2. System architecture A. Zoom in Algorithm This algorithm takes the original image, creates four image tiles out of it, that means splits the original image into four pieces, re-draws each of the part on each of the four new images. Size of each of the new image is equal to the original size. For eg. Original image = 100*100 size, Split it into four parts of 50*50 size each next we redraw these parts into 100*100 size images. Hence, we get four 100*100size images from original image of 100*100size. B. Difference Algorithm Here we divide the images into small chunks. We compare the respective chunks of image one and image two. Comparison process: 1. Compare the intensity of the chunks 2. Compare the colour codes of chunks if chunks are different then mark the chunks with a red box.

Again repeat the comparison process for all the chunks. And draw a new image with the red boxes marked i.e. showing the differences. Upload the difference image on tomcat server. C. Duplication Algorithm In this algorithm, we divide the images into small chunks. We compare the respective chunks of images. We have a sequence file with all the files of a binary data and it is the actual job that will filter & find the duplicates. D. Grayscale Algorithm A grayscale image is simply one in which the only colours are shades of gray. The reason for differentiating such images from any other sort of colour image is that less information needs to be provided for each pixel. In fact a `gray' color is one in which the red, green and blue components all have equal intensity in RGB space, and so it is only necessary to specify a single intensity value for each pixel, as opposed to the three intensities needed to specify each pixel in a full colour image. Often, the grayscale intensity is stored as an 8-bit integer giving 256 possible different shades of gray from black to white. VI. TECHNOLOGY USED M-R Framework Map Reduce is also a data processing model. Its greatest advantage is the easy scaling of data processing over multiple computing nodes. Under the Map Reduce model, the data processing primitives are called mappers and reducers. Decomposing a data processing application into mappers and reducers is sometimes nontrivial. But, once you write an application in the Map Reduce form, scaling the application to run over hundreds, thousands, or even tens of thousands of machines in a cluster is merely a configuration change. This simple scalability is what has attracted many programmers to the Map Reduce model. Pseudo-code for map and reduce functions map(string filename, String document) List<String> T = tokenize(document); for each token in T emit ((String)token, (Integer) 1); reduce(string token, List<Integer> values) Integer sum = 0; Understanding MapReduce 13 for each value in values sum = sum + value; emit ((String) token, (Integer) sum); HDFS (Hadoop Distributed File System) The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 40 petabytes of enterprise data at Yahoo! A high-end machine with four I/O

channels each having a throughput of 100 MB/sec will require three hours to read a 4 TB data set! With Hadoop, this same data set will be divided into smaller (typically 64 MB) blocks that are spread among many machines in the cluster via the Hadoop Distributed File System (HDFS). With a modest degree of replication, the cluster machines can read the data set in parallel and provide a much higher throughput. And such a cluster of commodity machines turns out to be cheaper than one high-end server! The HIPI Framework HIPI is a library for Hadoop's Map Reduce framework that provides an API for performing image processing tasks in a distributed computing environment. Here is an overview of our system: 3. Allow for simple filtering of a set of images 4. Present users with an intuitive interface for image-based operations and hide the details of the Map Reduce framework 5. HIPI will set up applications so that they are highly parallelized and balanced so that users do not have to worry about such details. VII. CONCLUSION This work has demonstrated the feasibility of massively parallel semantic classification of satellite imagery. The complex details of Hadoop s powerful Map Reduce framework and bring to the forefront what users care about most: images. Our system has been created with the intent to operate on large sets of images. We give the user a simple way to filter image sets and control the types of images being used in their Map Reduce tasks. In Map Reduce programming to apply set of operation like duplication, zoomin, difference removal, Grayscale conversion on particular satellite image. VIII. ACKNOWLEDGEMENT HIPI was created to empower researchers and present them with a capable tool that would enable research involving image processing and vision to be performed extremely easily. We used HIPI because the following points. 1. Provide an open, extendible library for image processing and computer vision applications in a Map Reduce framework. 2. Store images efficiently for use in Map Reduce applications We take this opportunity to thank our project guide and Head of the Department Prof. P.A.Bandgar. for their valuable guidance and mentoring throughout this project. Additionally, we must give great thanks to Miss. Sneha. and Mr. Bhagat. for guidance and support. REFERENCES [1] T. White, Hadoop: The Definitive Guide, 2nd ed. O Reilly Media / Yahoo Press, 2010. [2] APACHE, 2010. Hadoop mapreduce framework. http://hadoop.apache.org/mapreduce/.

[3] J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters, Communications of the ACM, vol. 51, no. 1, pp. 107-113, Jan. 2008. [4] B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang, Mars: A mapreduce framework on graphics processors, in Proc. of the 17th International Conference on Parallel Architectures and Compilation Techniques, New York, NY, USA: ACM, 2008, pp. 260-269. [5] CONNER, J. 2009. Customizing input file formats for image processing in hadoop. Arizona State University. Online at: http://hpc. asu. edu/node/97.