Earth Data Science in The Era of Big Data and Compute

Size: px
Start display at page:

Download "Earth Data Science in The Era of Big Data and Compute"


1 Earth Data Science in The Era of Big Data and Compute E. Lynn Usery U.S. Geological Survey U.S. Department of the Interior U.S. Geological Survey Board on Earth Sciences and Resources April 29, 2015

2 Panel Questions How do we collect, access, manage, process, analyze, visualize, interpret and curate Earth big data? What are the novel approaches to the science of Earth big data and needs for future development of data cyberinfrastructure?

3 The National Map The National Map includes eight data layers: land cover, structures, boundaries, hydrography, geographic names, transportation, elevation, orthoimagery Public domain data to support USGS topographic maps at 1:24,000-scale Products and services at multiple scales and resolutions Analysis, modeling and other applications at multiple scales and resolutions The National Map is built on partnerships and standards

4 Products and Services of The National Map Data Products National databases of base geospatial data content National Hydrography Dataset Best Practices Databases: Transportation, Structures, Boundaries (Gov Units), Elevation data and other lidar derivatives NAIP and High Resolution Orthoimagery National Land Cover Dataset (developed and maintained under a separate USGS program) Geographic Names (Geographic Names Information System and Gaz-Vector Integrated Data) Derived Products US Topo Historical Topographic Map Collection 4

5 The National Map- Hydrography The National Hydrography Dataset is the hydrography component of The National Map Represents the surface water of the United States Complete, national, seamless coverage at 1:100,000-scale (2001); 1:24,000-scale (2007); 1:4,800-scale level of content ongoing National coverage at 1:24,000 scale is greater than 600 Gb of data

6 The National Map- Orthoimagery National Agriculture Imagery Program Acquired by the US Dept of Agriculture, Farm Services Agency; USGS, other Federal agencies and States are partners Nationwide coverage 4-band; Natural color 1-meter 3.75 X 3.75 min tiles Data at full resolution for Dent County, MO is 800 Gb 6

7 The National Map- Elevation Complete national coverage of 10- meter resolution or better elevation data; substantial data at 3 m (1/9 th arc-second) New data collected are lidar (target resolution 1/9 th arc second) IfSAR data in Alaska (5 m) Lidar point cloud data also delivered as a product USGS Base lidar Acquisition specification

8 The National Map - Elevation: Quality Levels Quality Level Horizontal Point Spacing (meters) Vertical Accuracy (centimeters) Description High accuracy and resolution lidar example: lidar data collected in the Pacific Northwest Medium-high accuracy and resolution lidar < Medium accuracy and resolution lidar analogous to USGS specification v. 13 and most data collected to date Early or lower quality lidar and photogrammetric elevations produced from aerotriangulated NAIP imagery Lower accuracy and resolution, primarily from IfSAR

9 3D Elevation Program (3DEP) The 3DEP initiative implements one of the 10 program scenarios resulting from the National Enhanced Elevation Assessment (NEEA) study Key 3DEP goals: Lidar data QL-2 over the conterminous United States, Hawaii, and the territories on an eight-year cycle IfSAR data QL-5 over Alaska Lidar point cloud data to be publically accessible Multiple derivative products will be supported as services and will be freely available 3DEP is a program initiative of the USGS with operational distribution begun in 2015

10 3DEP Data Volumes For the purposes of the infrastructure assessment, 3DEP data volume is estimated at 9.4 PB.

11 Example Areas of Application of 3DEP Elevation Data Precision Farming Land Navigation and Safety Geologic Resources and Hazards Mitigation Natural Resource Conservation Infrastructure Management Flood Risk Mitigation

12 USGS Big Data Big data is not simply volume of data USGS has collected, processed, and distributed petabytes of data for decades We process and use these data for earth science applications with a divide and conquer approach Quadrangle mapping is a divide and conquer approach Staging data and scale thresholds for viewing are a divide and conquer approach 12

13 Big Data Big data occurs when divide and conquer will not work and a requirement to handle all data to get solution exists Similar to global operations in image processing vs local or neighborhood processes Example with 3DEP data Watershed modeling and analysis must handle data for the entire watershed and with lidar data at QL 2 resolutions, this is a big data problem. 13

14 Novel approaches Parallel computing, but for our problems parallel input/output operations are critical Geospatial data are generally well-suited to parallel approaches Segment geographic space and send each spatial component to a different processor 14

15 Novel approaches Move processing to the data Once data are loaded on our servers, we do not move the data again. Use server-side processing on the computer on which the data are stored Build cyberinfrastructure to support big data processing Network speeds are critical Parallel operations requires rethinking how we build systems and software 15

16 Earth Data Science in The Era of Big Data and Compute E. Lynn Usery U.S. Geological Survey U.S. Department of the Interior U.S. Geological Survey Board on Earth Sciences and Resources April 29, 2015