LARGE SCALE SATELLITE IMAGE PROCESSING USING HADOOP DISTRIBUTED SYSTEM



Similar documents
Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework

Chapter 7. Using Hadoop Cluster and MapReduce

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com

International Journal of Innovative Research in Computer and Communication Engineering

Distributed Computing and Big Data: Hadoop and MapReduce

Hadoop and Map-Reduce. Swati Gore

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

NoSQL and Hadoop Technologies On Oracle Cloud

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Hadoop. Sunday, November 25, 12

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data

HIPI: A Hadoop Image Processing Interface for Image-based MapReduce Tasks

UPS battery remote monitoring system in cloud computing

Log Mining Based on Hadoop s Map and Reduce Technique

BIG DATA CHALLENGES AND PERSPECTIVES

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Large Scale Spatial Data Management on Mobile Phone data set Using Exploratory Data Analysis

The Hadoop Framework

Connecting Hadoop with Oracle Database

Processing of Hadoop using Highly Available NameNode

Big Data and Apache Hadoop s MapReduce

Introduction to MapReduce and Hadoop

Hadoop and Eclipse. Eclipse Hawaii User s Group May 26th, Seth Ladd

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

CS54100: Database Systems

Storage and Retrieval of Data for Smart City using Hadoop

Lambda Architecture. CSCI 5828: Foundations of Software Engineering Lecture 29 12/09/2014

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Sunnie Chung. Cleveland State University

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

BIG DATA, MAPREDUCE & HADOOP

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Data Modeling for Big Data

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

CSE-E5430 Scalable Cloud Computing Lecture 2

Market Basket Analysis Algorithm on Map/Reduce in AWS EC2

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

16.1 MAPREDUCE. For personal use only, not for distribution. 333

Map Reduce & Hadoop Recommended Text:

LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT

Big Data: Study in Structured and Unstructured Data

Manifest for Big Data Pig, Hive & Jaql

Internals of Hadoop Application Framework and Distributed File System

Big Data: Tools and Technologies in Big Data

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

How To Use Hadoop For Gis

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Open source Google-style large scale data analysis with Hadoop

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Hadoop Technology for Flow Analysis of the Internet Traffic

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang

International Journal of Innovative Research in Information Security (IJIRIS) ISSN: (O) Volume 1 Issue 3 (September 2014)

Cloud Computing based on the Hadoop Platform

Introduction to DISC and Hadoop

Apache Hadoop. Alexandru Costan

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Resource Scalability for Efficient Parallel Processing in Cloud

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

BIG DATA TRENDS AND TECHNOLOGIES

Open source large scale distributed data management with Google s MapReduce and Bigtable

Big Data Storage, Management and challenges. Ahmed Ali-Eldin

Big Data with Rough Set Using Map- Reduce

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Bringing Big Data Modelling into the Hands of Domain Experts


Distributed Framework for Data Mining As a Service on Private Cloud

Hadoop IST 734 SS CHUNG

and HDFS for Big Data Applications Serge Blazhievsky Nice Systems

International Journal of Advance Research in Computer Science and Management Studies

A Service for Data-Intensive Computations on Virtual Clusters

BIG DATA SOLUTION DATA SHEET

Hadoop WordCount Explained! IT332 Distributed Systems

Big Data and Analytics: Challenges and Opportunities

Hadoop Framework. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

COMP9321 Web Application Engineering

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Improving Apriori Algorithm to get better performance with Cloud Computing

Performance Evaluation for BlobSeer and Hadoop using Machine Learning Algorithms

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce

Transcription:

LARGE SCALE SATELLITE IMAGE PROCESSING USING HADOOP DISTRIBUTED SYSTEM Sarade Shrikant D., Ghule Nilkanth B., Disale Swapnil P., Sasane Sandip R. Abstract- The processing of large amount of images is necessary when there are satellite images involved. Now a day's amount of data continues to grow as more information becomes available. With this increasing amount of surface and recognition, segmentation, and event detection in satellite images with a highly scalable system becomes more and more desirable. In this paper, a semantic taxonomy is constructed for the land-cover classification of satellite images. Whole system is constructed in a Distributed HADOOP Computing platform. This system is divided into two major part Training and Running classifier. The Training part classifies subsequent images such as Vegetation, Building, Pavement, water, snow etc. Large files are distributed and further divided among multiple data nodes. The map processing jobs located on all nodes are operated on their local copies of the data. It can be observed that the name node stores only the metadata and the log information while the data transfer to and from the HDFS is done through the Hadoop API. Training classifier is implemented using HADOOP MapReduce Framework and is based on Google Earth. The Running classifier performs zoom-in, zoom-out and calculates the difference between old and new images. Index Terms- event detection, hadoop, MapReduce, satellite, segmentation. MapReduce system called Hadoop. Soon after, Yahoo and others rallied around to support this effort. Today, Hadoop is a core part of the computing infrastructure for many web companies, such as Yahoo, Facebook, LinkedIn, and Twitter. Many more traditional businesses, such as media and telecom, are beginning to adopt this system too. Large-scale distributed data processing in general, is rapidly becoming an important skill set for many programmers. Apache Log Processing with Cascading 1 Node Runtime /Node 3 Nodes Runtime 21m46s 0.127 0.127 8m3s 0.0471 I. INTRODUCTION /Node 0.0157 Hadoop is an open source framework for writing and running distributed applications that process large amounts of data, the basic for writing a scalable, distributed dataintensive program Everyday, in this modern era we re surrounded by information in the form of data like people upload videos, take pictures on their cell phones, text friends, update their Facebook status, leave comments around the web, click on ads, and so forth. You may even be reading the book as digital data on your computer screen, and certainly your purchase of this book is recorded as data with some retailer. The exponential growth of data presents the challenges to cutting-edge business such as Google, Yahoo, Amazon, and Microsoft. They needed to go through terabytes and petabytes of data to figure out which websites were popular, what books were in demand, and what kinds of ads appealed to people. Existing tools were becoming inadequate to process such large data sets. Google was the first to publicize MapReduce- a system they had used to scale their data processing needs. Doug Cutting saw an opportunity and led the charge to develop an open source version of this 15 Nodes Runtime Naive Perl /Node Runtime /Node 1m30s 0.00878 0.000585 42m49s 0.251 0.251 Table 1.1 Apache Log Processing with Cascading The events occur on earth surface are possible to detect. For example- The events such as flooding, tsunami and snow storm etc. can be detected from the measurable change in ground surface cover as a result of damage to existing structures. Satellite images requires very huge amount of data storage. Large Scale Land-cover Recognition System Collects large amount of data of higher resolutions Satellite Images. It provides a collection of training data classifiers and performing subsequent image classification in distributed environment. ISSN: 2278 1323 All Rights Reserved 2014 IJARCET 731

MapReduce is a popular parallel model first introduced by Google, which is designed to handle and generate large scale data sets in distributed environment. It provides a convenient way to parallelize data analysis process. Its advantages include conveniences, robustness, and scalability. II.LITERATURE SERVEY In the internet service data base is most important part in that database image are stored in the world large data of images so that handling is very difficult just like,the duplication of image it s increases the data size.we are all known about if that data was big the processing time also more With the proliferation of online photo storage and social media from websites such as Facebook and Picasa, the amount of image data available is larger than ever before and growing more rapidly every day. the billions of images available to us on the web. These images are improve, however, by the fact that users are supplying tags (of objects, faces, etc.), comments, titles and descriptions of this data for us. This information produces with an amazing amount of unprecedented context for images. The idea can be applied to a wider range of image features that allow us to examine and analyze images in a revolutionary way. The current processing of images goes through ordinary sequential ways to accomplish this job. The program loads image after image, The processing of data today is done by using oracle versions such as 9i,10G or by any another DBMS software. But with the increasing usage of internet all over the world the data on net is increasing rapidly. So, the processing of mass data is not possible by using any oracle software or any another existing DBMS software. the report generated after analysis will help the user to know about his usage. For analyzing the data & images we are going to use Hadoop technology. III. EXISTING SYSTEM 3 MapReduce Programs 4 Resultant Image Images Upload: Large number of images are acquired from NASA and stored in file system in compressed format. Fig.1 shows some sample images stored in file system database. Fig.4.1 Sample Images Hadoop Distributed File System: To process a large number of image efficiently this Bundle of images is fed to hadoop distributed file system. The acquired signature image as as shown in Fig3.1. It is necessary to divide these higher resolution images into multiple segments and assign each image segment to different slave machines to efficiently compare the images. This can be done in distributed environment. Current processing of images goes through ordinary sequential ways to accomplish this job. The program loads image after image, processing each image alone before writing the newly processed image on a storage device. Generally, we use very ordinary tools that can be found in Photoshop. Besides, many ordinary C and Java programs can be downloaded from the Internet or easily developed to perform such image processing tasks. Most of these tools run on a single computer with a Windows operating system. Although batch processing can be found in these single-processor programs, there will be problems with the processing due to limited capabilities. Therefore, we are in need of a new parallel approach to work effectively on massed image data. IV. PROPOSED SYSTEM From previous studies, it has been observed that image process consists of following steps: 1 Images Upload 2 Hadoop Distributed File System. Fig 4.2 Segmented image MapReduce: The objective of this phase is to extract the features of the test image that will be compared to the features of image for image processing operations. On hadoop distributed file system, we execute set of operations like (i) ISSN: 2278 1323 All Rights Reserved 2014 IJARCET 732

Duplicate image removal(ii) Zoom in or zoom out and (iii) Find differences among Images using map reduce programs. Resultant Image Upload: The purpose of resultant image generation phase is to generate the resultant image then uploaded in web server and shown to user through web application depending upon the image processing operation selected. Fig 4.1 System Architecture Following is the working of system: 1. Large no. of images stored in file system. 2. This Bundle of images is fed to Hadoop distributed file system. 3. On HDFS, we execute set of operations like duplicate image removal, zoom in and find differences among Images, using MapReduce Programs 4. The Result is then uploaded in web server and shown to user through web application. Satellite image data continues to grow and evolve as higher spatial and temporal resolutions become available. With sufficient spatial and temporal resolutions, event detection becomes possible. With this increasing amount of surface and temporal data, recognition, segmentation and event detection in satellite images with a highly scalable system becomes more and more desirable. In this paper, a semantic taxonomy is constructed for the landcover classification of satellite images. VI. ALGORITHM Fig 4.3 Resultant image V. SYSTEM ARCHITECTURE A Large Scale Land-cover Recognition System is essentially a web application supported by a backend database. Large Scale Land-cover Recognition System is programmed in languages such as Java and AJAX. The basic idea is to implement MapReduce to split the large input data set into many small pieces and assigned small task to different devices. A scalable modeling system implemented in the Hadoop MapReduce framework is used for training the classifiers and performing subsequent image classification. Much of what the layperson thinks of as statistics is counting, and many basic Hadoop jobs involve counting. We can write a MapReduce program for this task. Like we said earlier, you hardly ever write a MapReduce program from scratch. You have an existing MapReduce Counting things program that processes the data in a similar way. We already have a program for getting the inverted citation index. We can modify that program to output the count instead of the list of citing patents. We need the modification only at the Reducer. If we choose to output the count as an IntWritable, we need to specify IntWritable in three places in the Reducer code. We called them V3 in our notation. For Example public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, IntWritable> { public void reduce(text key, Iterator<Text> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int count = 0; while (values.hasnext()) { values.next(); count++; output.collect(key, new IntWritable(count)); By changing a few lines and matching class types, we have a new MapReduce Program. This program may seem a minor modification. We expect a large number of patents to have been only cited once, and a small number may have been ISSN: 2278 1323 All Rights Reserved 2014 IJARCET 733

cited hundreds of times. It would be interesting to see the distribution of the citation counts. that are temporary and can later be directed to reduce stages. Within map stages or reduce stages, the processing is conducted in parallel. The map and reduce stages occur in a sequential manner by which the reduce stage starts when the map stages finishes. Recognition: On hadoop distributed file system, image recognition phase is to generate the resultant image then uploaded in web server and shown to user through web application depending upon the image processing operation. VIII. FUTURE ENHANCEMENT In the future, we might focus on using different image sources with different algorithms that can have a computationally intensive nature. So our application will be beneficial in future to the following sectors: Meteorological disaster - Violent, sudden and destructive change to the environment related to, produced by, or affecting the earth's atmosphere, especially the weather-forming processes. Military navigation - study of traversing through unfamiliar terrain by foot or in a land vehicle. Monitoring around the globe- to extract discriminative information about regions of the globe for which GIS data is not available. Fig 6.1 Map Reduce Algorithm VII. SYSTEM FEATURES This product is independent and self-contained which includes following components: Scaling Feature Extraction Recognition Scaling: Image scaling is the process of resizing a digital image. Scaling is a non-trivial process that involves a trade-off between efficiency, smoothness and sharpness. As the size of an image is increased, so the pixels which comprise the image become increasingly visible, making the image appears "soft". Conversely, reducing an image will tend to enhance its smoothness and apparent sharpness. Feature Extraction: MapReduce allows the computation to be done in two stages: the map stage and then the reduce stage. The data are split sets of key value pairs and their instances are processed in parallel by the map stage, with a parallel number that matches the node number dedicated as slaves. This process generates intermediate key value pairs IX. CONCLUSION In this paper a case study is presented for implementing parallel processing of remote sensing images in TIF format by using the HadoopMapReduce framework. The experimental results have shown that the typical image processing algorithms can be effectively parallelized with acceptable run times when applied to remote sensing images. A large number of images cannot be processed efficiently in the customary sequential manner. Although originally designed for text processing, HadoopMapReduce installed in a parallel cluster proved suitable to process TIF format images in large quantities. Thus we have decide to implementation parallel Hadoop which is better suited for large data sizes than for when a computationally intensive application is required. X. REFERENCES [1] Towards Large Scale Land-cover Recognition of Satellite Images Noel C. F. Codella, Gang Hua, ApostolNatsev, John R. Smith [2] H. Daschiel and M. Datcu. Information mining in remote sensing image archives: system evaluation. IEEE Trans. on Geoscience and Remote Sensing, 43(1):188 199, 2005. [3] G. M. Foody. Approaches for the production and evaluation of fuzzy land cover classifications from remotely- ISSN: 2278 1323 All Rights Reserved 2014 IJARCET 734

sensed data. International Journal of Remote Sensing, 17(7):1317 1340, 1996. [4] Y. Li and T. R. Bretschneider. Semantic-sensitive satellite image retrieval.ieee Transactions on Geoscience and Remote Sensing, 45(4):853 860, April 2007. [5] D. G. Lowe. Distinctive image features from scaleinvariant keypoints. Int. J. Comput. Vision, 60:91 110, November 2004. [6] Y. Li and T. R. Bretschneider. Semantic-sensitive satellite image retrieval. IEEE Transactions on Geoscience and Remote Sensing, 45(4):853 860, April 2007. GhuleNilkanth B. Tal- Ahmednagar,Dist- India.. [7] P. M. Atkinson and A. R. L. Tatnall. Neral networks in remote sensing. International Journal of Remote Sensing, 18(4):699 709, April 1997 [8] A. Carleer, O. Debeirb, and E. Wolff. Comparison of very high spatial resolution satellite image segmentations. In L. Bruzzone, editor, Proc. of SPIE Image and Signal Processing for Remote Sensing IX, volume 5238, pages 532 542, Bellingham, WA, 2004. [9] A. Parulekar, R. Datta, J. Li, and J. Z. Wang. Large-scale satellite image browsing using automatic semantic categorization and content- based retrieval. In Proc. ICCV 2005 Workshop on Semantic Knowledge in Computer Vision, pages 1873 11880, Beijing, China, October 2005 [10] G. G. Wilkinson. Results and implications of a study of fifteen years of satellite image classification experiments. IEEE Trans. on Geoscience and Remote Sensing, 43(3):433 440, 2005 [11] W. Messaoudi, I. R. Farah, K. S. Ettabaa, and B. Solaiman. Semantic strategic satellite image retrieval. In Proc. of 3rd International Confer- ence on Information and Communication Technologies: From Theory to Applications, pages 1 6, Damascus, April 2008. Disale Swapnil P. Tal- Ahmednagar,Dist- India.. Sasane Sandip R. Tal- Ahmednagar,Dist- India.. Authors SaradeShrikant D. Tal- Ahmednagar,Dist- India. ISSN: 2278 1323 All Rights Reserved 2014 IJARCET 735