Best PracDces for Building and Deploying PredicDve Models Over Big Data. Module 12: Case Study Matsu



Similar documents
The Matsu Wheel: A Cloud-based Scanning Framework for Analyzing Large Volumes of Hyperspectral Data

UK Location Programme

Figure 2: System Flow Diagram for Workflow Management

Keystone Image Management System

Introduction to Imagery and Raster Data in ArcGIS

CURSO Inspire INSPIRE. SPEAKER: Pablo Echamendi Lorente. JEUDI 23/ THURSDAY 23 rd W S V : G E O S P A T I A L D A T A A C C E S S

Supervised Classification workflow in ENVI 4.8 using WorldView-2 imagery

VISUAL INSPECTION OF EO DATA AND PRODUCTS - OVERVIEW

Assignment # 1 (Cloud Computing Security)

ITG Software Engineering

The USGS Landsat Big Data Challenge

COURSE CONTENT Big Data and Hadoop Training

SRS BIO OPTICAL WORKFLOW

INTEROPERABLE IMAGE DATA ACCESS THROUGH ARCGIS SERVER

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Professional Hadoop Solutions

NCDC's Application of Climate Data to Tourism Business Decision-Making

Obtaining and Processing MODIS Data

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Handling Heterogeneous EO Datasets via the Web Coverage Processing Service

Conservation Workshop ArcGIS Explorer

2009 CAP Grant Kickoff USGS, Reston, VA May 21, 2009

Chapter 7. Using Hadoop Cluster and MapReduce

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework

Create and share a map with GIScloud.com

Intergraph Geospatial Portfolio 2013

Advanced Image Management using the Mosaic Dataset

A GIS helps you answer questions and solve problems by looking at your data in a way that is quickly understood and easily shared.

What s new in Carmenta Server 4.2

Using Big Data and GIS to Model Aviation Fuel Burn

Developing a MapReduce Application

AN OPENGIS WEB MAP SERVER FOR THE ESA MULTI-MISSION CATALOGUE

Copyright Soleran, Inc. esalestrack On-Demand CRM. Trademarks and all rights reserved. esalestrack is a Soleran product Privacy Statement

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Oklahoma s Open Source Spatial Data Clearinghouse: OKMaps

A Tutorial Introduc/on to Big Data. Hands On Data Analy/cs over EMR. Robert Grossman University of Chicago Open Data Group

3D VISUALIZATION OF GEOTHERMAL WELLS DIRECTIONAL SURVEYS AND INTEGRATION WITH DIGITAL ELEVATION MODEL (DEM)

Big Data Volume & velocity data management with ERDAS APOLLO. Alain Kabamba Hexagon Geospatial

Allows the access and modification to all graphic and alphanumeric data in all layers of SIGPAC central database.

BIG DATA - HADOOP PROFESSIONAL amron

Ron Shaham. Expert Witness in Islamic Courts : Medicine and Crafts in the Service of Law. : University of Chicago Press,. p 38

. Perspectives on the Economics of Aging. : University of Chicago Press,. p 3 Copyright University of

May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

Creating a universe on Hive with Hortonworks HDP 2.0

Enhancing MapReduce Functionality for Optimizing Workloads on Data Centers

A Web-Based Library and Algorithm System for Satellite and Airborne Image Products

Project Title: Project PI(s) (who is doing the work; contact Project Coordinator (contact information): information):

Quick and Easy Web Maps with Google Fusion Tables. SCO Technical Paper

SeeMore. The 2013 Geospatial Portfolio Desktop. Geospatial for Smarter Decisions

DEVELOPMENT OF THE INTEGRATING AND SHARING PLATFORM OF SPATIAL WEBSERVICES

ArcGIS Data Models Practical Templates for Implementing GIS Projects

GS Big Data Platform

MobileMap and Spatial Content Management: Integrating Field Data Collection, Document Management and Enterprise GIS for Natural Resources

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Presenters: Luke Dougherty & Steve Crabb

Real Time Monitor of Grid Job Executions. Janusz Martyniak Imperial College London

Final Report - HydrometDB Belize s Climatic Database Management System. Executive Summary

Storytelling with Maps: Workflows and Best Practices

GXP WebView GEOSPATIAL EXPLOITATION PRODUCTS (GXP )

Software requirements * :

Large Scale Electroencephalography Processing With Hadoop

Cloudera Certified Developer for Apache Hadoop

Yahoo! Grid Services Where Grid Computing at Yahoo! is Today

Creating A Galactic Plane Atlas With Amazon Web Services

Part I Courses Syllabus

PaDent Privacy Monitoring with Splunk

MDS. Measured Data Server Online Measurement Network. Properties and Benefits »»» »»»» ProduCt information

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

Documentation of open source GIS/RS software projects

Visualize your World. Democratization i of Geographic Data

Developing Fleet and Asset Tracking Solutions with Web Maps

TerraColor White Paper

Activity: Using ArcGIS Explorer

Bhuvan. Indian Earth Observation Visualization. Indian Space Research Organisation. Multi Resolution. Societal Applications

MiSeq: Imaging and Base Calling

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Norwegian Satellite Earth Observation Database for Marine and Polar Research USE CASES

GIS Solutions for FTTx Design. David Nelson, GISP

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Accelerating and Simplifying Apache

How To Write A Data Processing Pipeline In R

Chase Wu New Jersey Ins0tute of Technology

Transcription:

Best PracDces for Building and Deploying PredicDve Models Over Big Data Module 12: Case Study Matsu Robert Grossman Open Data Group & Univ. of Chicago Collin Benne= Open Data Group October 23, 2012

Zoom Levels Zoom Level 1: 4 images Zoom Level 2: 16 images Zoom Level 3: 64 images Zoom Level 4: 256 images

Build Tile Cache - Mapper Mapper Input Key: Bounding Box Mapper Input Value: (minx = - 135.0 miny = 45.0 maxx = - 112.5 maxy = 67.5) Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Step 1: Input to Mapper Mapper resizes and/or cuts up the original image into pieces to output Bounding Boxes Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Step 2: Processing in Mapper Mapper Output Key: Bounding Box Mapper Output Value: Step 3: Mapper Output

Build Tile Cache - Reducer Reducer Key Input: Bounding Box (minx = - 45.0 miny = - 2.8125 maxx = - 43.59375 maxy = - 2.109375) Reducer Value Input: Step 1: Input to Reducer Assemble Images based on bounding box Output to Accumulo Builds up Layers for WMS for various datasets Step 2: Reducer Output

Tiling procedure in detail

Preprocess Satellite Imagery EO- 1 images are provided by NASA as Level 1 images Each band and the metadata are individual files For distributed processing in Hadoop, we need to read all of an image s bands in the same map instance So we serialize the band files into a single file

Image SerializaDon Project Matsu supports two soludons: Regular file, Base64- encoded EnDre file is a single line Hadoop SequenceFile, Base64- encoded Each band is a single line Each approach uses Avro (open source Apache Soiware FoundaDon project) for serializadon.

SerializaDon Approaches Mapper reads every band and specifies which ones are kept. Less efficient, more portable as it does not rely on Hadoop SequenceFile support Mapper specifies which bands to read More efficient, only the bands needed for the analydc are read

Tiling procedure in detail

Map An image is read by a single mapper Actual bands are selected and/or virtual bands created Sent to reducer by geographical Dles

Tiling procedure in detail

Reduce Reducers produce Web Tiles for each zoom level Storage in Accumulo Index: Graphic Tile Timestamp Value: Image Metadata

Building zoomed- out images Reduce step overlays Dles and builds zoomed- out images Four neighboring Dles are combined and shrunken to decrease by one zoom level Process condnues undl one image covers the endre region that the reducer is responsible for (e.g. 1/2 N th of the world) Tdepth- lngindex- latindex parent is depth 1, lng/2, lat/2

Tiling procedure in detail

AnalyDc Modules If an analydc produces a web Dle, then it can piggy back along the web Dling workflow The data generated is only the addidonal bands to be displayed

Embedding MulDple Modules

Example: Algebraic combinadon of spectral bands Some CO2 acdvity follows visible cloud formadons, some doesn t Icelandic volcano in April 2010 (Eyjanallajökull) Visible frame is full of ash clouds CO2 distribudon is non- uniform

Example: Algebraic combinadon of spectral bands Some CO2 acdvity follows visible cloud formadons, some doesn t Icelandic volcano in April 2010 (Eyjanallajökull) Visible frame is full of ash clouds CO2 distribudon is non- uniform Module Code to Create AddiDonal Band: sum1 = 4. sumx = 183. + 184. + 188. + 189. sumxx = 183.**2 + 184.**2 + 188.**2 + 189.**2 sumy = B183 + B184 + B188 + B189 sumxy = 183.*B183 + 184.*B184 + 188.*B188 + 189.*B189 delta = sum1*sumxx - sumx**2 constant = (sumxx*sumy - sumx*sumxy) / delta linear = (sum1*sumxy - sumx*sumy) / delta subtracted = (B185 - (constant + 185.*linear))/2. + (B186 - (constant + 186.*linear))/2.

QuesDons? For the most current version of these slides, please see tutorials.opendatagroup.com

About Open Data Open Data began operadons in 2001 and has built predicdve models for companies for over ten years Open Data provides management consuldng, outsourced analydc services, & analydc staffing For more informadon www.opendatagroup.com info@opendatagroup.com