Big Data in OpenTopography



Similar documents
AT&T Global Network Client for Windows Product Support Matrix January 29, 2015

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

Enhanced Vessel Traffic Management System Booking Slots Available and Vessels Booked per Day From 12-JAN-2016 To 30-JUN-2017

Analysis One Code Desc. Transaction Amount. Fiscal Period

Case 2:08-cv ABC-E Document 1-4 Filed 04/15/2008 Page 1 of 138. Exhibit 8

Managing Lidar (and other point cloud) Data. Lindsay Weitz Cody Benkelman

Ashley Institute of Training Schedule of VET Tuition Fees 2015

Centers of Academic Excellence in Cyber Security (CAE-C) Knowledge Units Review

Introduction to Imagery and Raster Data in ArcGIS

SEO Presentation. Asenyo Inc.

Computing & Telecommunications Services Monthly Report March 2015

SigMo Platform based approach for automation of workflows in large scale IT-Landscape. Tarmo Ploom 2/21/2014

The following was presented at DMT 14 (June 1-4, 2014, Newark, DE).

Lidar 101: Intro to Lidar. Jason Stoker USGS EROS / SAIC

PACE Predictive Analytics Center of San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

Energy Savings from Business Energy Feedback

Rapid Consumption and Deployment of SAP Software as Virtual Appliances Using SAP Cloud Appliance Library

Advanced Image Management using the Mosaic Dataset

CENTERPOINT ENERGY TEXARKANA SERVICE AREA GAS SUPPLY RATE (GSR) JULY Small Commercial Service (SCS-1) GSR

Deploying ArcGIS for Server using Managed Services

White Paper: Efficient Management of Cloud Resources

FY 2015 Schedule at a Glance

PowerSteering Product Roadmap Your Success Is Our Bottom Line

Academic Calendar Arkansas State University - Jonesboro

CAFIS REPORT

Alexandria Overview. Sept 4, 2015

Consumer ID Theft Total Costs

Portal for ArcGIS. Satish Sankaran Robert Kircher

P/T 2B: 2 nd Half of Term (8 weeks) Start: 25-AUG-2014 End: 19-OCT-2014 Start: 20-OCT-2014 End: 14-DEC-2014

P/T 2B: 2 nd Half of Term (8 weeks) Start: 26-AUG-2013 End: 20-OCT-2013 Start: 21-OCT-2013 End: 15-DEC-2013

P/T 2B: 2 nd Half of Term (8 weeks) Start: 24-AUG-2015 End: 18-OCT-2015 Start: 19-OCT-2015 End: 13-DEC-2015

oct 03 / 2013 nov 12 / oct 05 / oct 07 / oct 21 / oct 24 / nov 07 / 2013 nov 14 / 2013.

IR Best Practice & the Tools Needed to Achieve it

LEVERAGE VBLOCK SYSTEMS FOR Esri s ArcGIS SYSTEM

ERDAS IMAGINE The world s most widely-used remote sensing software package

Andrew Pylyp. Capital Market Day. Managing Director Wer liefert was? Stockholm 27. November 2006

End of Life Content Report November Produced By The NHS Choices Reporting Team

Bb Upgrade Timeline. Oct Nov Dec Jan 2011 Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan PDFmyURL.com

Managing Open Source Code Best Practices

Office of the Government Chief Information Officer The Government of the Hong Kong Special Administrative Region

ENVI THE PREMIER SOFTWARE FOR EXTRACTING INFORMATION FROM GEOSPATIAL IMAGERY.

NetCDF and HDF Data in ArcGIS

Enterprise GIS Architecture Deployment Options. Andrew Sakowicz

From: Steve Berberich, Vice President of Technology and Corporate Services and Chief Financial Officer

North Carolina Department of Cultural Resources Digital Preservation Plan. State Archives of North Carolina State Library of North Carolina

Enterprise GIS Solutions to GIS Data Dissemination

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

From Virtualized to ITaaS. Copyright 2011 EMC Corporation. All rights reserved.

Perkhidmatan Kerajaan Pintar : Meningkatkan Kepuasan Pelanggan

A Web-Based Library and Algorithm System for Satellite and Airborne Image Products

Department of Public Welfare (DPW)

Software Re-Engineering and Ux Improvement for ElegantJ BI Business Intelligence Suite

Rights and Scheduling: Vision Broadcast Master. Lee Sheppard, Product Line Manager Client Conference, March 2013

Harvard Data Visualization Project

Deploying ArcGIS for Server Using Managed Services

2015 UNFI Marketing Program Rates, 6/30/14

Annexure B: Planning, Budgeting and Performance Management Programme

Building an Effective Roadmap

MD imap 2.0 THE NEXT GENERATION OF MARYLAND S ENTERPRISE GIS. Esri MUG Conference Baltimore, MD December 3,

Applying ICT and IoT to Multifamily Buildings. U.S. Department of Energy Buildings Interoperability Vision Meeting March 12, 2015 Jeff Hendler, ETS

ArcGIS Data Models Practical Templates for Implementing GIS Projects

webmethods Mobile Designer June 2011

Building your Server for High Availability and Disaster Recovery. Witt Mathot Danny Krouk

Important Dates Calendar FALL

PEER BENCHMARKING. A Powerful Tool for IT Portfolio Planning. Noah Wittman, Educational Technology Services, UC Berkeley UCCSC - 04 August 2014

Cost effective methods of test environment management. Prabhu Meruga Director - Solution Engineering 16 th July SCQAA Irvine, CA

1. Introduction. 2. User Instructions. 2.1 Set-up


Project Title: Project PI(s) (who is doing the work; contact Project Coordinator (contact information): information):

Altix Usage and Application Programming. Welcome and Introduction

Supervisor Instructions for Approving Web Time Entry

Roles: Scrum Master & Project Manager

2012 CBECS: Preliminary Results and Timeline for Data Release

AgriLife Information Technology IT General Session January 2010

Sea Water Heat Pump Project

BCOE Payroll Calendar. Monday Tuesday Wednesday Thursday Friday Jun Jul Full Force Calc

WEATHERHEAD EXECUTIVE EDUCATION COURSE CATALOG

NICE and Framework Overview

SAR Archive and Community Support Activities at UNAVCO

Network Analysis with ArcGIS Online

Trends in Global Capacity Availability and Trading. Bruce Girdlestone VP Network Trading Band-X

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Newfoundland and Labrador Hydro Electricity Rates

Climatography of the United States No

Transcription:

Big Data in OpenTopography Vishu Nandigam San Diego Supercomputer Center NSF Big Data in Educa<on Workshop Jan 28-29 2015

Presenta<on Overview Lidar and OpenTopography Data and Workflow Cyberinfrastructure Data Growth and Challenges Data Insights Research and Development

LIDAR LIght Detec<on And Ranging (aka airborne laser swath mapping) Billions of of accurate distance measurements with a scanning laser rangefinder + GPS + Iner<al Measurement Unit (IMU) Point cloud (x,y,z coordinates) = fundamental LIDAR data product Image: David Haddad, AGS

Canopy Height (ft)

OpenTopography NSF Earth Science Facility: 3 year support in 2009. Renewed in 2012 (Award No. 1226353 & 1225810 EAR/IF) CI and Science Collabora3on SDSC, ASU & UNAVCO Related research efforts NASA ROSES: Extend to Satellite- based LIDAR (waveform data) NSF SI2 CyberGIS: OpenTopo as an exemplar of cyber GIS NSF CluE: Inves<gate Computer Science issues in big data Partnerships with state and local agencies to support data hos<ng and processing capabili<es

OpenTopography Facility Overview

OpenTopography Data Large user community with variable needs and levels of sophis<ca<on. Goal: maximize access to data to achieve greatest scien<fic impact. Big data treat data as an asset that can be used and reused Co- locate data with on- demand processing

Data Workflow 1. Original Source Data from Collector (eg. NCALM) 2. Extract relevant data products 3. Source data is archived in Chronopolis digital preserva<on network 1. UCSD Library (Library of Congress) 2. Three geographically distributed copies of the data 4. Extracted data products go through QA/QC 5. Data transforma<on and op<miza<on 1. Error correc<on 2. Projec<on conversion 6. Update Metadata ISO 19115 (Data) 7. Generate addi<onal derived products (e.g. GE hillshades) Data available via OT

Catalog Service for the Web / DOI (8 &9 of the Data Workflow) CSW Catalog ESRI Geoportal Server ISO 19115 (Data) CZO, CyberGIS, Thomson Reuters Web of Science DOI: hmp://dx.doi.org/10.5069/ G9ST7MR9 EZID is a service of UC Cura<on Center of the CDL Protocol: 'CSW' Profile: 'ArcGIS Server Geoportal Extension

Current Data Holdings Lidar Point Cloud Datasets 770 Billion+ lidar returns. Each return has addi<onal amributes On- demand processing capabili<es SRTM, Raster (mul<ple layers) e.g. Sonoma - several intensity products, canopy height, bare earth, hydro- enforced bare earth, canopy top, etc. 30+ TB of on- demand online data (Excluding big custom job runs, original archived data, pre processing data and user generated products)

OT CyberInfrastructure OpenTopography Portal Other Clients Application Tier CSW PC Data Access SRTM Service Data Access Service PC Data Processing SRTM Data Service Processing Service DOI Point Cloud Data SRTM Dataset Data and Processing Gordon Nodes SDSC Cloud OT Resources XSEDE Resources SDSC Resources Infrastructure Tier

OT Challenges (Keeping pace with data and user growth and advancing science!)

LIDAR Point Cloud Data Growth 160 140 120 100 80 60 40 OpenTopography 20 0 Summer 2005 fall2005 winter 2006 spring 2006 Summer 2006 fall 2006 winter 2007 spring 2007 summer 2007 fall 2007 winter 2008 spring 2008 Summer 2008 Fall 2008 Winter 2009 Spring 2009 summer 2009 Fall 2009 Winter 2010 Spring 2010 Summer 2010 Fall 2010 Winter 2011 Spring 2011 Summer 2011 Fall 2011 Winter 2012 Spring 2012 Summer 2012 Fall 2012 Winter 2013 Spring 2013 Summer 2013 Fall 2013 Winter 2014 Spring 2014

Sensor Hardware Technology Rapid Evolu<on of Laser Scanner Technology Data is being collected on mul<ple channels (different wavelengths) and capturing full waveform data. Early datasets collected with scanners opera<ng at less than 33Khz. Current systems collect data at ~900Khz Greater Resolu<on = Larger Data Volumes Image: RIEGL USA

Diverse and complex datasets support Discrete return lidar Full waveform lidar Op<cal imagery (R,G,B orthophotography) Hyperspectral imagery NEON Hyperspectral Imagery collected light reflected across the electromagne<c spectrum for a total of 426 bands of informa<on. Image: Nathan Leisso, NEON AOP

User Growth 7000 OT registered user growth 6000 5000 4000 3000 2000 1000 0 Aug- 08 Nov- 08 Feb- 09 May- 09 Aug- 09 Nov- 09 Feb- 10 May- 10 Aug- 10 Nov- 10 Feb- 11 May- 11 Aug- 11 Nov- 11 Feb- 12 May- 12 Aug- 12 Nov- 12 Feb- 13 May- 13 Aug- 13 Nov- 13 Feb- 14 May- 14 Aug- 14 Nov- 14 CyberGIS, NASA/UNAVCO communi<es Service Level Access

Pluggable Services Infrastructure Methods for scien<fic data processing are evolving Users demanding more processing services Pluggable services infrastructure OT development sandbox Assist researchers with their code Deployment of the algorithm as a service Update processing workflow and UI Increase in user Generated Derived products!

Data Insights What can we learn from: 40,000+ custom PC jobs 1.2 trillion lidar points processed 15,000+ custom raster jobs (past year)

Data Access Pamerns Source: hmp://www.businessinsider.com/chart- of- the- day- the- lifecycle- of- a- youtube- video- 2010-5

Access Pamerns in LIDAR Scien<fic Datasets B4 San Andreas Fault 160 140 120 100 80 60 40 20 0

Access Pamerns in LIDAR Scien<fic Datasets Event based datasets - El Mayor- Cucapah Earthquake 140 120 100 80 60 40 20 0

Access Pamerns in LIDAR Scien<fic Datasets Event based datasets Hai< Earthquake 160 140 120 100 80 60 40 20 0

Data Usage Analy<cs Northern San Andreas Fault Social networking with data Recommenda<on system

Understanding Traffic Pamerns Impact of dataset releases and workshops on traffic Page Visits 14000 12000 10000 8000 6000 4000 2000 0 Jan- 13 Feb- 13 Mar- 13 Apr- 13 May- 13 Jun- 13 Jul- 13 Aug- 13 Sep- 13 Oct- 13 Nov- 13 Dec- 13 New Users Registered users Daily visits SRTM Global Dataset Released 200 180 160 140 120 100 80 60 40 20 0 Registered Users

Tiered storage based on Data Access Ac<vity based data ranking and <ered cloud integrated storage SSD New and existing most active data Regular Disk Between Hot and Cool Cloud Least Active

Cloud Compu<ng Cost effec<veness & feasibility of data science facili<es on the cloud Microsos Azure for Research Award Integra<on of cloud based on- demand geospa<al processing services into community earth science data facili<es.

Leveraging HPC Dedicated Gordon nodes via XSEDE (democra<za<on of supercompu<ng resources) GEO Data and Cyberinfrastructure Impera<ve: Harness the Power of Compu<ng and Computa<onal Infrastructure. - GEO Priori3es and Fron3ers: 2015-2020

Summary OpenTopography is an modern agile data facility Cyberinfrastructure driven by science use cases CI and science collabora<on Big Data needs to be usable - Community not only wants access to data but also wants tools for processing these data. Concept of OT can be used as a template for other large data facili<es

Thank You! White River, IN. Credit: Indiana Geological Survey/The State of Indiana viswanat@sdsc.edu info@opentopography.og @OpenTopography