Lesley Wyborn and Ben Evans. nci.org.au. IEEE Big Data in the Geosciences Workshop, Santa Clara National Computational Infrastructure 2015

Size: px
Start display at page:

Download "Lesley Wyborn and Ben Evans. nci.org.au. IEEE Big Data in the Geosciences Workshop, Santa Clara 2015. National Computational Infrastructure 2015"

Transcription

1 Integra(ng Big Geoscience Data into the Petascale Na(onal Environmental Research Interoperability Pla<orm (NERDIP): Successes and Unforeseen challenges Lesley Wyborn and Ben

2 The Australian Research Funding Schemes Two main tranches of funding: Na(onal Collabora(ve Research Infrastructure Strategy (NCRIS) $542M for ($75 M for cyberinfrastructure) Super Science Ini(a(ve $901 million for ($347M for cyberinfrastructure) Annual Maintenance funding of around $180M pa since All programmes were designed ensure that Australian research con(nues to be compe((ve and rank highly on an interna(onal scale. Millions of Dollars Compute Data Tools Networks

3 We went from nothing.. to...

4 The Research Data Storage Infrastructure Progress on Data Ingest as of 16 October, 2015: ~43 Petabytes in 8 distributed nodes 10 PBytes Source: h\ps://

5 Integrated World-class Scien(fic Compu(ng Environment at NCI Data Services THREDDS Server-side analysis and visualization VDI: Cloud scale user desktops on data 10PB+ Research Data Web-time analytics software

6 Na(onal Environment Research Data Collec(ons (NERDC) 1. Climate/ESS Model Assets and Data Products 2. Earth and Marine Observa(ons and Data Products 3. Geoscience Collec(ons 4. Terrestrial Ecosystems Collec(ons 5. Water Management and Hydrology Collec(ons Data Collec(ons CMIP5, CORDEX ACCESS products LANDSAT, MODIS, VIIRS, AVHRR, INSAR, MERIS Digital Eleva(on, Bathymetry, Onshore Geophysics Seasonal Climate Bureau of Meteorology Observa(ons Bureau of Meteorology Ocean-Marine Terrestrial Ecosystem Reanalysis products Approx. Capacity 2 Pbytes 3.3 Pbytes 2 Pbytes 400 Tbytes 600 Tbytes 400 Tbytes 220 Tbytes 290 Tbytes 175 Tbytes

7 10+ PB of Data for Interdisciplinary Science Astronomy (Optical) 200 TB CMIP5 3PB Earth Observ. 2 PB Atmosphere 2.4 PB Water Ocean 1.5 PB Weather 340 TB Bathy, DEM 100 TB Marine Videos 10 TB GA CSIRO ANU Other Na;onal Interna;onal Geophysics 300 TB

8 Managing 10+ PB of Data for Scalable In-situ Access Combined and integrated, the NCI collec(ons are too large to move bandwidth limits the capacity to move them easily the data transfers are too slow, complicated and too expensive even if our data can be moved, few can afford to store 10 PB on spinning disk We need to change our focus to: moving users to the data (for sophis(cated analysis) moving processing to data having online applica(ons to process the data in-situ Improving the sophis(ca(on of users with our help We called for a new form of system design where: storage and various types of computa(on are co-located systems are programmed and operated to allow users to interac(vely invoke different forms of analysis in-situ over integrated large-scale data collec(ons

9 Rethinking Hardware Architectures for Data-intensive Science Work at NCI has also highlighted the need for balanced systems to enable Dataintensive Science including: Interconnec(ng processes and high throughput to reduce inefficiencies The need to really care about placement of data resources Be\er communica(ons between the nodes I/O capability to match the computa(onal power Close coupling of cluster, cloud and storage NCI s Integrated High Performance Environment

10 My take is that Big Data is not just about the V s 1. Volume: data at rest 2. Velocity: data in mo(on (streaming) 3. Variety: many types, forms and structures (or no structures) 4. Veracity: trustworthiness, provenance, lineage, quality 5. Validity: data that is correct 6. Visualiza(on: data in pa\erns 7. Vulnerability: data at risk 8. Value: data that is meaningful 9. V????? 10. V?????

11 Big Data is a rela(ve term where the volume, velocity and variety of data exceed an organisa(ons storage or compute capacity for accurate and (mely decision making Big Data vs High Performance Data We define High Performance Data (HPD) as data that is carefully prepared, standardised and structured so that it can be used in Data-Intensive Science on HPC (Evans et al., 2015) To get on top of the Data Tsunami, we need to convert Big data collec(ons into HPD by Aggrega(ng data into seamless pre-processed data products Crea(ng hyper-cubes and self describing data arrays h\p:// 1964: 1KB = 2m of tape or ~ 20 cards 2014: a 4 GB Thumb drive = ~8000 Km of Tape or ~83 million cards 2014: 20 PB of modern storage = ~ 32 trillion metres of tape ~ 320 trillion cards

12 Crea(ng HPD collec(ons: eg the Landsat Cube The Landsat cube arranges 636,000 Landsat Source scenes spa(ally and temporally, to allow flexible but efficient large-scale in-situ analysis The data is par((oned into spa(ally-regular, (me-stamped, band-aggregated (les which are presented as temporal stacks. Temporal Stack Spa(ally par((oned (les

13 Current Landsat Holdings Reforma\ed as HPD 4M Spa(ally-Regular Time- Stamped Tiles (0.5 PB) 636,000 Landsat Source Scene Datasets (~52 x Pixels)

14 High-Res, Mul(-Decadal, Con(nental-Scale Analysis Water Detec;on from Space 15 Years of data from LS5 & LS7( ) 25m Nominal Pixel Resolu(on Approx. 133,000 individual source scenes in approx. 12,400 passes En(re archive of 1,312,087 ARG25 (les => 21x10 12 pixels can be processed in ~8 hours

15 Data Pla<orms of today need to scale down to small users

16 Scaling down to the smaller users Do we enable individual scenes to be downloaded for locally hosted small scale analysis? Or do we facilitate small scale analysis, in-situ on data sets that are dynamically updated?

17 Biodiversity & Climate Change VL Climate & Weather Systems Lab Introducing the Na(onal Environmental Data Interoperability Research Pla<orm (NERDIP) emast Speddexes ereefs AGDC VL All Sky Virtual Observatory Workflow Engines, Virtual Laboratories (VL s), Science Gateways VGL Globe Claritas VHIRL Open Nav Surface Ferret, NCO, GDL, GDAL, GRASS, QGIS Models Fortran, C, C++, MPI, OpenMP Python, R, MatLab, IDL Visualisa;on Drish; ANDS/RDA AODN/IMOS TERN AuScope Data. gov.au Digital Bathymetry & Eleva=on Tools Data s Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Services Layer (expose data models & seman=cs) Direct Access Fast whole-of-library catalogue RDF, LD Open DAP WMS WFS WCS WPS SOS W*S Metadata Layer netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. Data Library Layer 1 Climate/Weather/Ocean Libgdal FITS Airborne Geophysics Line data SEG-Y LAS LiDAR BAG HP Data Library Layer 2 HDF5 MPI-enabled Lustre HDF5 Serial Other Storage (op;ons)

18 Biodiversity & Climate Change VL Ferret, NCO, GDL, GDAL, GRASS, QGIS Models NERDIP: Enabling Mul(ple Ways to Interact with the Data emast AGDC All Sky Virtual Climate Infrastructure & Weather Speddexes ereefs to Lower Barriers VL Observatory VGL to Entry Globe VHIRL Systems Lab Claritas Workflow Engines, Virtual Laboratories (VL s), Science Gateways Ace Users Fortran, C, C++, MPI, OpenMP Tools Python, R, MatLab, IDL Visualisa;on Drish; ANDS/RDA AODN/IMOS Data Discovery TERN AuScope Data s Data. gov.au Open Nav Surface Digital Bathymetry & Eleva=on Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Services Layer (expose data models & seman=cs) Direct Access Fast whole-of-library catalogue RDF, LD Open DAP WMS WFS WCS WPS SOS W*S Metadata Layer Data Library Layer 1 netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. Climate/Weather/Ocean Data Pla<orm Libgdal FITS Airborne Geophysics Line data SEG-Y LAS LiDAR BAG HP Data Library Layer 2 HDF5 MPI-enabled Lustre HDF5 Serial Other Storage (op;ons)

19 Biodiversity & Climate Change VL NERDIP: Enabling Mul(ple Ways to Interact with the Data emast AGDC All Sky Virtual Climate Infrastructure & Weather Speddexes ereefs to Lower Barriers VL Observatory VGL to Entry Globe VHIRL Systems Lab Claritas Workflow Engines, Virtual Laboratories (VL s), Science Gateways Open Nav Surface Ferret, NCO, GDL, GDAL, GRASS, QGIS Models Ace Users Fortran, C, C++, MPI, OpenMP Python, R, MatLab, IDL Visualisa;on Drish; ANDS/RDA TERN AuScope Data. Digital Bathymetry AODN/IMOS Data s gov.au & Eleva=on Tools Data s Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Services Layer (expose data models & seman=cs) Direct Access Fast whole-of-library catalogue RDF, LD Open DAP WMS WFS WCS WPS SOS W*S Metadata Layer Data Library Layer 1 netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. Climate/Weather/Ocean Data Pla`orm Libgdal FITS Airborne Geophysics Line data SEG-Y LAS LiDAR BAG HP Data Library Layer 2 HDF5 MPI-enabled Lustre HDF5 Serial Other Storage (op;ons)

20 Pla<orms Free Data from the Prison of the s s are for visi;ng, pla`orms are for building on s present aggregated content in a way that invites explora(on, but the experience is pre-determined by a set of decisions by the builder about what is necessary, relevant and useful. Pla<orms put design decisions into the hands of users: there are innumerable ways of interac(ng with the data Pla<orms offer many more opportuni(es for innova(on: new interfaces can be built, new visualisa(ons framed, ul(mately new science rapidly emerges Tim Sherra\ h\p://

21 Biodiversity & Climate Change VL Ferret, NCO, GDL, GDAL, GRASS, QGIS Models NERDIP: Enabling Ace Users to Interact with the Data emast AGDC All Sky Virtual Climate Infrastructure & Weather Speddexes ereefs to Lower Barriers VL Observatory VGL to Entry Globe VHIRL Systems Lab Claritas Workflow Engines, Virtual Laboratories (VL s), Science Gateways Ace Users Fortran, C, C++, MPI, OpenMP Tools Python, R, MatLab, IDL Visualisa;on Drish; ANDS/RDA AODN/IMOS Data Discovery TERN AuScope Data s Data. gov.au Open Nav Surface Digital Bathymetry & Eleva=on Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Services Layer (expose data models & seman=cs) Direct Access Fast whole-of-library catalogue RDF, LD Open DAP WMS WFS WCS WPS SOS W*S Metadata Layer netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. Data Pla<orm Data Library Layer 1 Climate/Weather/Ocean Libgdal FITS Airborne Geophysics Line data SEG-Y LAS LiDAR BAG HP Data Library Layer 2 HDF5 MPI-enabled Lustre HDF5 Serial Other Storage (op;ons)

22 Biodiversity & Climate Change VL Ferret, NCO, GDL, GDAL, GRASS, QGIS Models NERDIP: Enabling Ace Users to Interact with the Data emast AGDC All Sky Virtual Climate Infrastructure & Weather Speddexes ereefs to Lower Barriers VL Observatory to Entry VGL Globe VHIRL Systems Lab Claritas Workflow Engines, Virtual Laboratories (VL s), Science Gateways Fortran, C, C++, MPI, OpenMP Tools Python, R, MatLab, IDL Visualisa;on Drish; ANDS/RDA AODN/IMOS Data Discovery TERN AuScope Data s Data. gov.au Open Nav Surface Digital Bathymetry & Eleva=on Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Services Layer (expose data models & seman=cs) Direct Access Fast whole-of-library catalogue RDF, LD Open DAP WMS WFS WCS WPS SOS W*S Metadata Layer netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. Data Pla<orm Data Library Layer 1 Climate/Weather/Ocean Libgdal FITS Airborne Geophysics Line data SEG-Y LAS LiDAR BAG HP Data Library Layer 2 HDF5 MPI-enabled Lustre HDF5 Serial Other Storage (op;ons)

23 Biodiversity & Climate Change VL Ferret, NCO, GDL, GDAL, GRASS, QGIS Models NERDIP: Enabling Applica(on Developers to Interact with the Data Infrastructure to Lower Barriers to Entry Climate & Weather Systems Lab APPLICATION emast AGDC All Sky Virtual Speddexes ereefs VL Observatory Workflow Engines, Virtual Laboratories (VL s), Science Gateways VGL Globe Claritas Fortran, FOCUSSED C, C++, Python, R, Visualisa;on ANDS/RDA AODN/IMOS TERN MPI, OpenMP AuScope Data. Ace Users DEVELOPERS MatLab, IDL Drish; gov.au Tools VHIRL Data Discovery Data s Open Nav Surface Digital Bathymetry & Eleva=on Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Services Layer (expose data models & seman=cs) Direct Access Fast whole-of-library catalogue RDF, LD Open DAP WMS WFS DATA MANAGEMENT WCS WPS SOS W*S Metadata Layer Data Library Layer 1 netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. Data Pla<orm FOCUSSED Libgdal DEVELOPERS Airborne Climate/Weather/Ocean FITS Geophysics SEG-Y Line data LAS LiDAR BAG HP Data Library Layer 2 HDF5 MPI-enabled Lustre HDF5 Serial Other Storage (op;ons)

24 Biodiversity & Climate Change VL Ferret, NCO, GDL, GDAL, GRASS, QGIS Models NERDIP Territorial Wars: Applica(on Developers vs Data Managers Infrastructure to Lower Barriers to Entry emast AGDC All Sky Virtual Climate & Weather Speddexes ereefs VL VGL Globe VHIRL Open Nav Observatory Systems Lab Claritas Surface Workflow APPLICATION Engines, Virtual Laboratories (VL s), Science Gateways Fortran, C, C++, Python, R, Visualisa;on ANDS/RDA AODN/IMOS TERN MPI, OpenMP MatLab, IDL AuScope Data. Digital Bathymetry Ace Users Data Discovery Drish; gov.au & Eleva=on Tools Data s Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) FOCUSSED DEVELOPERS Services Layer (expose data models Fast whole-of-library & seman=cs) catalogue Direct Access RDF, LD Open DAP WMS WFS WCS WPS SOS W*S Metadata Layer netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. Data Library Layer 1 Climate/Weather/Ocean Libgdal FITS Airborne Geophysics Line data SEG-Y LAS LiDAR BAG HP Data Library Layer 2 Data Pla<orm DATA MANAGEMENT HDF5 MPI-enabled FOCUSSED DEVELOPERS Lustre HDF5 Serial Other Storage (op;ons)

25 NERDIP: Applica(ons Replica(ng Ways of Interac(ng with the Data Biodiversity & Climate Change VL Climate & Weather Systems Lab emast Speddexes ereefs AGDC VL All Sky Virtual Observatory VGL Globe Claritas VHIRL Open Nav Surface Workflow Engines, Virtual Laboratories (VL s), Science Gateways Ferret, NCO, GDL, GDAL, GRASS, QGIS Models Fortran, C, C++, MPI, OpenMP Python, R, MatLab, IDL Visualisa;on Drish; ANDS/RDA AODN/IMOS TERN AuScope Data. gov.au Digital Bathymetry & Eleva=on Tools Data s Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Services Layer (expose data models & seman=cs) Direct Access Fast whole-of-library catalogue RDF, LD Open DAP WMS WFS WCS WPS SOS W*S Metadata Layer netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. Data Library Layer 1 Climate/Weather/Ocean Libgdal FITS Airborne Geophysics Line data SEG-Y LAS LiDAR BAG HP Data Library Layer 2 HDF5 MPI-enabled Lustre HDF5 Serial Other Storage (op;ons)

26 Biodiversity & Climate Change VL Ferret, NCO, GDL, GDAL, GRASS, QGIS Models NERDIP: Loosely coupling Applica(ons and Data via a Services Layer Infrastructure to Lower Barriers to Entry Climate & Weather Systems Lab APPLICATION emast AGDC All Sky Virtual Speddexes ereefs VL Observatory Workflow Engines, Virtual Laboratories (VL s), Science Gateways Ace Users VGL Globe Claritas Fortran, FOCUSSED C, C++, Python, R, Visualisa;on DEVELOPERS ANDS/RDA AODN/IMOS TERN MPI, OpenMP AuScope Data. MatLab, IDL Drish; gov.au Tools VHIRL Data Discovery Data s Open Nav Surface Digital Bathymetry & Eleva=on Services Layer (expose data models & seman=cs) Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Direct Access SERVICES INTERFACE Fast whole-of-library catalogue RDF, LD Open DAP WMS WFS WCS WPS SOS W*S Metadata Layer netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. DATA MANAGEMENT Data Library Layer 1 Climate/Weather/Ocean Libgdal Data FITS Pla<orm Airborne Geophysics Line data SEG-Y FOCUSSED DEVELOPERS LAS LiDAR BAG HP Data Library Layer 2 HDF5 MPI-enabled Lustre HDF5 Serial Other Storage (op;ons)

27 Biodiversity & Climate Change VL Ferret, NCO, GDL, GDAL, GRASS, QGIS Models NERDIP: Loosely coupling Applica(ons and Data via a Services Layer Infrastructure to Lower Barriers to Entry Climate & Weather Systems Lab APPLICATION emast AGDC All Sky Virtual Speddexes ereefs VL Observatory Workflow Engines, Virtual Laboratories (VL s), Science Gateways Ace Users VGL Globe Claritas Fortran, FOCUSSED C, C++, Python, R, Visualisa;on DEVELOPERS ANDS/RDA AODN/IMOS TERN MPI, OpenMP AuScope Data. MatLab, IDL Drish; gov.au Tools VHIRL Data Discovery Data s Open Nav Surface Digital Bathymetry & Eleva=on Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Services Layer (expose data models & seman=cs) Direct Access Fast whole-of-library catalogue RDF, LD Open DAP WMS WFS WCS WPS SOS W*S Metadata Layer Data Library Layer 1 HP Data Library Layer 2 netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. Climate/Weather/Ocean DATA MANAGEMENT HDF5 MPI-enabled Lustre Libgdal Data FITS Pla<orm Airborne Geophysics Line data SEG-Y FOCUSSED DEVELOPERS LAS LiDAR HDF5 Serial BAG Other Storage (op;ons)

28 Biodiversity & Climate Change VL Ferret, NCO, GDL, GDAL, GRASS, QGIS Models NERDIP: the Metadata Layer emast AGDC All Sky Virtual Climate Infrastructure & Weather Speddexes ereefs to Lower Barriers VL Observatory VGL to Entry Globe VHIRL Systems Lab Claritas Workflow Engines, Virtual Laboratories (VL s), Science Gateways Ace Users Fortran, C, C++, MPI, OpenMP Tools Python, R, MatLab, IDL Visualisa;on Drish; ANDS/RDA AODN/IMOS Data Discovery TERN AuScope Data s Data. gov.au Open Nav Surface Digital Bathymetry & Eleva=on Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Services Layer (expose data models & seman=cs) Direct Access Fast whole-of-library catalogue RDF, LD Open DAP WMS WFS WCS WPS SOS W*S Metadata Layer netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. Data Library Layer 1 Climate/Weather/Ocean Libgdal FITS Airborne Geophysics Line data SEG-Y LAS LiDAR BAG HP Data Library Layer 2 HDF5 MPI-enabled Lustre HDF5 Serial Other Storage (op;ons)

29 Biodiversity & Climate Change VL Ferret, NCO, GDL, GDAL, GRASS, QGIS Models NERDIP: The Data Layers emast AGDC All Sky Virtual Climate Infrastructure & Weather Speddexes ereefs to Lower Barriers VL Observatory VGL to Entry Globe VHIRL Systems Lab Claritas Workflow Engines, Virtual Laboratories (VL s), Science Gateways Ace Users Fortran, C, C++, MPI, OpenMP Tools Python, R, MatLab, IDL Visualisa;on Drish; ANDS/RDA AODN/IMOS Data Discovery TERN AuScope Data s Data. gov.au Open Nav Surface Digital Bathymetry & Eleva=on Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Services Layer (expose data models & seman=cs) Direct Access Fast whole-of-library catalogue RDF, LD Open DAP WMS WFS WCS WPS SOS W*S Metadata Layer netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. Data Library Layer 1 Climate/Weather/Ocean Libgdal FITS Airborne Geophysics Line data SEG-Y LAS LiDAR BAG HP Data Library Layer 2 HDF5 MPI-enabled Lustre HDF5 Serial Other Storage (op;ons)

30 Biodiversity & Climate Change VL Ferret, NCO, GDL, GDAL, GRASS, QGIS Models emast AGDC All Sky Virtual Climate Infrastructure & Weather Speddexes ereefs to Lower Barriers VL Observatory VGL to Entry Globe VHIRL Systems Lab Claritas Workflow Engines, Virtual Laboratories (VL s), Science Gateways Fortran, C, C++, MPI, OpenMP NERDIP: Infrastructure to Lower Barriers to Entry Ace Users Tools Python, R, MatLab, IDL Visualisa;on Drish; ANDS/RDA AODN/IMOS Data Discovery TERN AuScope Data s Data. gov.au Open Nav Surface Digital Bathymetry & Eleva=on Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Services Layer (expose data models & seman=cs) Direct Access Fast whole-of-library catalogue RDF, LD Open DAP WMS WFS WCS WPS SOS W*S Metadata Layer Data Library Layer 1 netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. Climate/Weather/Ocean Data Pla<orm Libgdal FITS Airborne Geophysics Line data SEG-Y LAS LiDAR BAG HP Data Library Layer 2 HDF5 MPI-enabled Lustre HDF5 Serial Other Storage (op;ons)

31 Biodiversity & Climate Change VL Ferret, NCO, GDL, GDAL, GRASS, QGIS Models Climate & Weather Systems Lab NERDIP: Infrastructure to Lower Barriers to Entry AGDC VL VGL Workflow Engines, Virtual Laboratories (VL s), Science Gateways Ace Users Fortran, C, C++, MPI, OpenMP Tools Python, R, MatLab, IDL emast Speddexes ereefs Visualisa;on Drish; ANDS/RDA All Sky Virtual Observatory AODN/IMOS Globe Claritas VHIRL Data Discovery TERN AuScope Data s Data. gov.au Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Open Nav Surface Digital Bathymetry & Eleva=on Metadata Layer Data Library Layer 1 Direct Access netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. NetCDF-4 Climate/Weather/Ocean Fast whole-of-library catalogue NetCDF-4 RDF, LD Libgdal WMS WFS Data Pla<orm [FITS] WCS WPS Airborne Geophysics Line data SOS [SEG-Y] Open DAP Services Layer (expose data model & seman=cs) BAG LAS LiDAR HP Data Library Layer 2 HDF5 MPI-enabled Lustre HDF5 Serial Other Storage (options)

32 NERDIP: Infrastructure to Lower Barriers to Entry Biodiversity & Climate Change VL Ferret, NCO, GDL, GDAL, GRASS, QGIS Models Climate & Weather Systems Lab AGDC VL Workflow Engines, Virtual Laboratories (VL s), Science Gateways Ace Users Fortran, C, C++, MPI, OpenMP Tools Python, R, MatLab, IDL emast Speddexes ereefs Visualisa;on Drish; ANDS/RDA All Sky Virtual Observatory AODN/IMOS VGL Globe Claritas VHIRL Data Discovery TERN AuScope Data s Data. gov.au Open Nav Surface Digital Bathymetry & Eleva=on Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Services Layer (expose data models & seman=cs) Direct Access Fast whole-of-library catalogue RDF, LD Open DAP WMS WFS WCS WPS SOS W*S Metadata Layer netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. Data Library Layer 1 Climate/Weather/Ocean Libgdal FITS Airborne Geophysics Line data SEG-Y LAS LiDAR BAG HP Data Library Layer 2 HDF5 MPI-enabled Lustre HDF5 Serial Other Storage (op;ons)

33 NERDIP: Enabling Mul(ple Ways to Interact with the Data Biodiversity & Climate Change VL Climate & Weather Systems Lab emast Speddexes ereefs AGDC VL All Sky Virtual Observatory VGL Globe Claritas VHIRL Open Nav Surface Workflow Engines, Virtual Laboratories (VL s), Science Gateways Ferret, NCO, GDL, GDAL, GRASS, QGIS Models Fortran, C, C++, MPI, OpenMP Python, R, MatLab, IDL Visualisa;on Drish; ANDS/RDA AODN/IMOS TERN AuScope Data. gov.au Digital Bathymetry & Eleva=on Tools Data s Na=onal Environmental Research Data Interoperability PlaDorm (NERDIP) Services Layer (expose data models & seman=cs) Direct Access Fast whole-of-library catalogue RDF, LD Open DAP WMS WFS WCS WPS SOS W*S Metadata Layer netcdf-cf HDF-S ISO 19115, RIF-CS, DCAT, etc. Data Library Layer 1 Climate/Weather/Ocean Libgdal FITS Airborne Geophysics Line data SEG-Y LAS LiDAR BAG HP Data Library Layer 2 HDF5 MPI-enabled Lustre HDF5 Serial Other Storage (op;ons)

34 Key Messages on Big Data in the Geosciences Data at scales of today have to be built as shared global facili(es based around na(onal ins(tu(ons. Domain-neutral interna(onal standards for data collec(ons and interoperability are cri(cal for allowing complex interac(ons in HP environments both within and between HPD collec(ons No one can do it alone. No one organisa(on, no one group, no one country has the required resources or the exper(se. Shared collabora(ve efforts such as Research Data Alliance, the Earth Systems Grid Federa(on (ESGF), the Belmont Forum, EarthServer, the Oceans Data Interoperability Pla<orm (ODIP), EarthCube, G and OneGeology are needed to realise the full poten(al of the new data intensive science infrastructures For it now takes a village of partnerships to raise a HPD data center in a Big Data World h\p:// h\ps:// 2010/06/iStock_ XSmall.jpg

Perspec'ves on Big Data in the Geosciences from a major Australian na'onal data center

Perspec'ves on Big Data in the Geosciences from a major Australian na'onal data center Perspec'ves on Big Data in the Geosciences fro a ajor Australian na'onal data center Lesley Wyborn @NCInews The Data Tsunai has not yet landed The 43 PB on the Research Data Storage Infrastructure 10 PBytes

More information

It s not just about big data for the Earth and Environmental Sciences: it s now about High Performance Data (HPD)

It s not just about big data for the Earth and Environmental Sciences: it s now about High Performance Data (HPD) It s not just about big data for the Earth and Environmental Sciences: it s now about High Performance Data (HPD) Lesley Wyborn Geoscience Australia New Petascale Raijin Computer at NCI Outline of the

More information

High Performance Compu2ng and High Performance Data: exploring the growing use of Supercomputers in Oil and Gas Explora2on & Produc2on

High Performance Compu2ng and High Performance Data: exploring the growing use of Supercomputers in Oil and Gas Explora2on & Produc2on High Performance Compu2ng and High Performance Data: exploring the growing use of Supercomputers in Oil and Gas Explora2on & Produc2on Lesley Wyborn1, Ben Evans1, David Lescinsky2 and Clinton Foster2 16

More information

Big Data Research at DKRZ

Big Data Research at DKRZ Big Data Research at DKRZ Michael Lautenschlager and Colleagues from DKRZ and Scien:fic Compu:ng Research Group Symposium Big Data in Science Karlsruhe October 7th, 2014 Big Data in Climate Research Big

More information

Big Data and Scientific Discovery

Big Data and Scientific Discovery Big Data and Scientific Discovery Bill Harrod Office of Science William.Harrod@science.doe.gov! February 26, 2014! Big Data and Scien*fic Discovery Next genera*on scien*fic breakthroughs require: Major

More information

NASA's Strategy and Activities in Server Side Analytics

NASA's Strategy and Activities in Server Side Analytics NASA's Strategy and Activities in Server Side Analytics Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at the ESGF/UVCDAT Conference Lawrence Livermore National Laboratory

More information

Big Data and Cloud Computing for GHRSST

Big Data and Cloud Computing for GHRSST Big Data and Cloud Computing for GHRSST Jean-Francois Piollé (jfpiolle@ifremer.fr) Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer Facing data deluge

More information

National eresearch Collaboration Tools and Resources nectar.org.au

National eresearch Collaboration Tools and Resources nectar.org.au National eresearch Collaboration Tools and Resources nectar.org.au NeCTAR is an Australian Government project conducted as part of the Super Science initiative and financed by the Education Investment

More information

NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015

NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015 NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015-1 - A little bit about myself Computer Scien.st Brown, IIT Delhi Real- 3me Graphics, Virtual Reality, HCI Computa3onal

More information

Driving Earth Systems Collaboration across the Pacific

Driving Earth Systems Collaboration across the Pacific Driving Earth Systems Collaboration across the Pacific Tim F. Pugh Centre for Australian Weather and Climate Research Australian Bureau of Meteorology What s our business? Earth System Science Study of

More information

Research Collaboration in the Cloud: - the NeCTAR Research Cloud

Research Collaboration in the Cloud: - the NeCTAR Research Cloud Research Collaboration in the Cloud: - the NeCTAR Research Cloud National eresearch Collaboration Tools and Resources nectar.org.au NeCTAR is an initiative of the Australian Government being conducted

More information

Map- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering

Map- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering Map- reduce, Hadoop and The communica3on bo5leneck Yoav Freund UCSD / Computer Science and Engineering Plan of the talk Why is Hadoop so popular? HDFS Map Reduce Word Count example using Hadoop streaming

More information

An Open Dynamic Big Data Driven Applica3on System Toolkit

An Open Dynamic Big Data Driven Applica3on System Toolkit An Open Dynamic Big Data Driven Applica3on System Toolkit Craig C. Douglas University of Wyoming and KAUST This research is supported in part by the Na3onal Science Founda3on and King Abdullah University

More information

The Digicene: the Age of Big Data in the Geosciences

The Digicene: the Age of Big Data in the Geosciences The Digicene: the Age of Big Data in the Geosciences Lee Allison Arizona Geological Survey USGIN Foundation, Inc. Earth Data Science in the Era of Big Data and Compute April 30, 2015 A system that works

More information

The Arctic Observing Network and its Data Management Challenges Florence Fetterer (NSIDC/CIRES/CU), James A. Moore (NCAR/EOL), and the CADIS team

The Arctic Observing Network and its Data Management Challenges Florence Fetterer (NSIDC/CIRES/CU), James A. Moore (NCAR/EOL), and the CADIS team The Arctic Observing Network and its Data Management Challenges Florence Fetterer (NSIDC/CIRES/CU), James A. Moore (NCAR/EOL), and the CADIS team Photo courtesy Andrew Mahoney NSF Vision What is AON? a

More information

Interactive Data Visualization with Focus on Climate Research

Interactive Data Visualization with Focus on Climate Research Interactive Data Visualization with Focus on Climate Research Michael Böttinger German Climate Computing Center (DKRZ) 1 Agenda Visualization in HPC Environments Climate System, Climate Models and Climate

More information

Ibis: Scaling Python Analy=cs on Hadoop and Impala

Ibis: Scaling Python Analy=cs on Hadoop and Impala Ibis: Scaling Python Analy=cs on Hadoop and Impala Wes McKinney, Budapest BI Forum 2015-10- 14 @wesmckinn 1 Me R&D at Cloudera Serial creator of structured data tools / user interfaces Mathema=cian MIT

More information

Data-Intensive Science and Scientific Data Infrastructure

Data-Intensive Science and Scientific Data Infrastructure Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific

More information

Migrating to Hosted Telephony. Your ultimate guide to migrating from on premise to hosted telephony. www.ucandc.com

Migrating to Hosted Telephony. Your ultimate guide to migrating from on premise to hosted telephony. www.ucandc.com Migrating to Hosted Telephony Your ultimate guide to migrating from on premise to hosted telephony Intro What is covered in this guide? A professional and reliable business telephone system is a central

More information

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS)

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) Jessica Chapman, Data Workshop March 2013 ASKAP Science Data Archive Talk outline Data flow in brief Some radio

More information

Big Data Services at DKRZ

Big Data Services at DKRZ Big Data Services at DKRZ Michael Lautenschlager and Colleagues from DKRZ and Scientific Computing Research Group MPI-M Seminar Hamburg, March 31st, 2015 Big Data in Climate Research Big data is an all-encompassing

More information

Wrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney

Wrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney Wrangler: A New Generation of Data-intensive Supercomputing Christopher Jordan, Siva Kulasekaran, Niall Gaffney Project Partners Academic partners: TACC Primary system design, deployment, and operations

More information

A standards-based open source processing chain for ocean modeling in the GEOSS Architecture Implementation Pilot Phase 8 (AIP-8)

A standards-based open source processing chain for ocean modeling in the GEOSS Architecture Implementation Pilot Phase 8 (AIP-8) NATO Science & Technology Organization Centre for Maritime Research and Experimentation (STO-CMRE) Viale San Bartolomeo, 400 19126 La Spezia, Italy A standards-based open source processing chain for ocean

More information

How Does the Cloud Fit into Active Archiving. Active Archive Alliance Panel

How Does the Cloud Fit into Active Archiving. Active Archive Alliance Panel How Does the Cloud Fit into Active Archiving Active Archive Alliance Panel Active Archive Alliance Founded 2010 The Active Archive Alliance is a multi-vendor association aimed at evolving new technologies

More information

Big process for big data

Big process for big data Big process for big data Process automa9on for data- driven science Ian Foster Computa9on Ins9tute Argonne Na9onal Laboratory & The University of Chicago Talk at Astroinforma9cs 2012, Redmond, September

More information

Research Data Collection Data Management Plan

Research Data Collection Data Management Plan Research Data Collection Data Management Plan This document records the whole process of the data collections, including pre- preparation data,

More information

CMIP6 Data Management at DKRZ

CMIP6 Data Management at DKRZ CMIP6 Data Management at DKRZ icas2015 Annecy, France on 13 17 September 2015 Michael Lautenschlager Deutsches Klimarechenzentrum (DKRZ) With contributions from ESGF Executive Committee and WGCM Infrastructure

More information

Make the Most of Big Data to Drive Innovation Through Reseach

Make the Most of Big Data to Drive Innovation Through Reseach White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability

More information

The Ophidia framework: toward cloud- based big data analy;cs for escience Sandro Fiore, Giovanni Aloisio, Ian Foster, Dean Williams

The Ophidia framework: toward cloud- based big data analy;cs for escience Sandro Fiore, Giovanni Aloisio, Ian Foster, Dean Williams The Ophidia framework: toward cloud- based big data analy;cs for escience Sandro Fiore, Giovanni Aloisio, Ian Foster, Dean Williams Sandro Fiore, Ph.D. CMCC Scientific Computing and Operations Division

More information

The Benefits and Challenges in Global Meteorological Satellite Data Sharing

The Benefits and Challenges in Global Meteorological Satellite Data Sharing The Benefits and Challenges in Global Meteorological Satellite Data Sharing Ninghai Sun and Fuzhong Weng Center for Satellite Applica0ons and Research Na0onal Oceanic and Atmospheric Administra0on Presented

More information

Scientific Computing Meets Big Data Technology: An Astronomy Use Case

Scientific Computing Meets Big Data Technology: An Astronomy Use Case Scientific Computing Meets Big Data Technology: An Astronomy Use Case Zhao Zhang AMPLab and BIDS UC Berkeley zhaozhang@cs.berkeley.edu In collaboration with Kyle Barbary, Frank Nothaft, Evan Sparks, Oliver

More information

Research Computing Building Blocks INFRASTRUCTURE FOR DATA AT PURDUE PRESTON SMITH, DIRECTOR OF RESEARCH SERVICES PSMITH@PURDUE.

Research Computing Building Blocks INFRASTRUCTURE FOR DATA AT PURDUE PRESTON SMITH, DIRECTOR OF RESEARCH SERVICES PSMITH@PURDUE. Research Computing Building Blocks INFRASTRUCTURE FOR DATA AT PURDUE PRESTON SMITH, DIRECTOR OF RESEARCH SERVICES PSMITH@PURDUE.EDU Discussion http://www.geartechnology.com/blog/wp- content/uploads/2015/11/opportunity-

More information

High Performance Compu2ng and Big Data. High Performance compu2ng Curriculum UvA- SARA h>p://www.hpc.uva.nl/

High Performance Compu2ng and Big Data. High Performance compu2ng Curriculum UvA- SARA h>p://www.hpc.uva.nl/ High Performance Compu2ng and Big Data High Performance compu2ng Curriculum UvA- SARA h>p://www.hpc.uva.nl/ Big data was big news in 2012 and probably in 2013 too. The Harvard Business Review talks about

More information

EARTH SYSTEM SCIENCE ORGANIZATION Ministry of Earth Sciences, Government of India

EARTH SYSTEM SCIENCE ORGANIZATION Ministry of Earth Sciences, Government of India EARTH SYSTEM SCIENCE ORGANIZATION Ministry of Earth Sciences, Government of India INDIAN INSTITUTE OF TROPICAL METEOROLOGY, PUNE-411008 Advertisement No. PER/03/2012 Opportunities for Talented Young Scientists

More information

Update on the Cloud Demonstration Project

Update on the Cloud Demonstration Project Update on the Cloud Demonstration Project Khalil Yazdi and Steven Wallace Spring Member Meeting April 19, 2011 Project Par4cipants BACKGROUND Eleven Universi1es: Caltech, Carnegie Mellon, George Mason,

More information

Development of Earth Science Observational Data Infrastructure of Taiwan. Fang-Pang Lin National Center for High-Performance Computing, Taiwan

Development of Earth Science Observational Data Infrastructure of Taiwan. Fang-Pang Lin National Center for High-Performance Computing, Taiwan Development of Earth Science Observational Data Infrastructure of Taiwan Fang-Pang Lin National Center for High-Performance Computing, Taiwan GLIF 13, Singapore, 4 Oct 2013 The Path from Infrastructure

More information

Introduction to ACENET Accelerating Discovery with Computational Research May, 2015

Introduction to ACENET Accelerating Discovery with Computational Research May, 2015 Introduction to ACENET Accelerating Discovery with Computational Research May, 2015 What is ACENET? What is ACENET? Shared regional resource for... high-performance computing (HPC) remote collaboration

More information

BENCHMARKING V ISUALIZATION TOOL

BENCHMARKING V ISUALIZATION TOOL Copyright 2014 Splunk Inc. BENCHMARKING V ISUALIZATION TOOL J. Green Computer Scien

More information

Big Data at ECMWF Providing access to multi-petabyte datasets Past, present and future

Big Data at ECMWF Providing access to multi-petabyte datasets Past, present and future Big Data at ECMWF Providing access to multi-petabyte datasets Past, present and future Baudouin Raoult Principal Software Strategist ECMWF Slide 1 ECMWF An independent intergovernmental organisation established

More information

Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems. Jason Cope copej@mcs.anl.gov

Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems. Jason Cope copej@mcs.anl.gov Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems Jason Cope copej@mcs.anl.gov Computation and I/O Performance Imbalance Leadership class computa:onal scale: >100,000

More information

CMIP5 Data Management CAS2K13

CMIP5 Data Management CAS2K13 CMIP5 Data Management CAS2K13 08. 12. September 2013, Annecy Michael Lautenschlager (DKRZ) With Contributions from ESGF CMIP5 Core Data Centres PCMDI, BADC and DKRZ Status DKRZ Data Archive HLRE-2 archive

More information

Engagement Strategies for Emerging Big Data Collaborations

Engagement Strategies for Emerging Big Data Collaborations Engagement Strategies for Emerging Big Data Collaborations Lauren Rotman, lauren@es.net ESnet Science Engagement Group Lead Lawrence Berkeley National Laboratory APAN 39 th Conference Global Collaborations

More information

Behind the scene III Cloud computing

Behind the scene III Cloud computing Behind the scene III Cloud computing Athens, 15.11.2014 M. Dolenc / R. Klinc Why we do it? Engineering in the cloud is a combina3on of cloud based services and rich interac3ve applica3ons allowing engineers

More information

NASA s Big Data Challenges in Climate Science

NASA s Big Data Challenges in Climate Science NASA s Big Data Challenges in Climate Science Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at IEEE Big Data 2014 Workshop October 29, 2014 1 2 7-km GEOS-5 Nature Run

More information

NCI ESG install in the cloud

NCI ESG install in the cloud NCI ESG install in the cloud Ben Evans 11 December 2014 @NCInews Installation ESGF installation in the OpenStack cloud. Mix of puppet, esg-install, snapshot, config Production: IDP+portal 4x international

More information

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013 Return on Experience on Cloud Compu2ng Issues a stairway to clouds Experts Workshop Agenda InGeoCloudS SoCware Stack InGeoCloudS Elas2city and Scalability Elas2c File Server Elas2c Database Server Elas2c

More information

Outcomes of the CDS Technical Infrastructure Workshop

Outcomes of the CDS Technical Infrastructure Workshop Outcomes of the CDS Technical Infrastructure Workshop Baudouin Raoult Baudouin.raoult@ecmwf.int Funded by the European Union Implemented by Evaluation & QC function C3S architecture from European commission

More information

OS/Run'me and Execu'on Time Produc'vity

OS/Run'me and Execu'on Time Produc'vity OS/Run'me and Execu'on Time Produc'vity Ron Brightwell, Technical Manager Scalable System SoAware Department Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation,

More information

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management

More information

The Big Integra-on Simula'on Pla,orms for Low Carbon Decision Making

The Big Integra-on Simula'on Pla,orms for Low Carbon Decision Making The Big Integra-on Simula'on Pla,orms for Low Carbon Decision Making Dr. Ma;hias Berger Role of Informa'on & BigData Interac've Tools for Decision Making Urban Planning @ FCL Beyond Smart Ci'es Background

More information

Big Data Processing Experience in the ATLAS Experiment

Big Data Processing Experience in the ATLAS Experiment Big Data Processing Experience in the ATLAS Experiment A. on behalf of the ATLAS Collabora5on Interna5onal Symposium on Grids and Clouds (ISGC) 2014 March 23-28, 2014 Academia Sinica, Taipei, Taiwan Introduction

More information

Luc Declerck AUL, Technology Services Declan Fleming Director, Information Technology Department

Luc Declerck AUL, Technology Services Declan Fleming Director, Information Technology Department Luc Declerck AUL, Technology Services Declan Fleming Director, Information Technology Department What is cyberinfrastructure? Outline Examples of cyberinfrastructure t Why is this relevant to Libraries?

More information

PART 1. Representations of atmospheric phenomena

PART 1. Representations of atmospheric phenomena PART 1 Representations of atmospheric phenomena Atmospheric data meet all of the criteria for big data : they are large (high volume), generated or captured frequently (high velocity), and represent a

More information

Big Data Meets High Performance Computing

Big Data Meets High Performance Computing WHITE PAPER Intel Enterprise Edition for Lustre* Software High Performance Data Division Big Data Meets High Performance Computing Intel Enterprise Edition for Lustre* software and Hadoop combine to bring

More information

The Development of Cloud Interoperability

The Development of Cloud Interoperability NSC- JST Workshop The Development of Cloud Interoperability Weicheng Huang Na7onal Center for High- performance Compu7ng Na7onal Applied Research Laboratories 1 Outline Where are we? Our experiences before

More information

Big Data and Clouds: Challenges and Opportuni5es

Big Data and Clouds: Challenges and Opportuni5es Big Data and Clouds: Challenges and Opportuni5es NIST January 15 2013 Geoffrey Fox gcf@indiana.edu h"p://www.infomall.org h"p://www.futuregrid.org School of Informa;cs and Compu;ng Digital Science Center

More information

Bathymetric Attributed Grids (BAGs): Discovery of Marine Datasets and Geospatial Metadata Visualization

Bathymetric Attributed Grids (BAGs): Discovery of Marine Datasets and Geospatial Metadata Visualization University of New Hampshire University of New Hampshire Scholars' Repository Center for Coastal and Ocean Mapping Center for Coastal and Ocean Mapping 2010 Bathymetric Attributed Grids (BAGs): Discovery

More information

European Data Infrastructure - EUDAT Data Services & Tools

European Data Infrastructure - EUDAT Data Services & Tools European Data Infrastructure - EUDAT Data Services & Tools Dr. Ing. Morris Riedel Research Group Leader, Juelich Supercomputing Centre Adjunct Associated Professor, University of iceland BDEC2015, 2015-01-28

More information

PIT adoption for Climate Data Management

PIT adoption for Climate Data Management PIT adoption for Climate Management The German Climate Computing Center (DKRZ) 5th RDA Plenary Stephan Kindermann Overview DKRZ: A climate science service provider services for the international climate

More information

Data Requirements from NERSC Requirements Reviews

Data Requirements from NERSC Requirements Reviews Data Requirements from NERSC Requirements Reviews Richard Gerber and Katherine Yelick Lawrence Berkeley National Laboratory Summary Department of Energy Scientists represented by the NERSC user community

More information

MSDI: Workflows, Software and Related Data Standards

MSDI: Workflows, Software and Related Data Standards MSDI: Workflows, Software and Related Data Standards By Andy Hoggarth October 2009 Introduction Leveraging SDI principles for hydrographic operational efficiency French INFRAGEOS example (SHOM - Service

More information

Big Data Volume & velocity data management with ERDAS APOLLO. Alain Kabamba Hexagon Geospatial

Big Data Volume & velocity data management with ERDAS APOLLO. Alain Kabamba Hexagon Geospatial Big Data Volume & velocity data management with ERDAS APOLLO Alain Kabamba Hexagon Geospatial Intergraph is Part of the Hexagon Family Hexagon is dedicated to delivering actionable information through

More information

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of

More information

Cloud Computing @ JPL Science Data Systems

Cloud Computing @ JPL Science Data Systems Cloud Computing @ JPL Science Data Systems Emily Law, GSAW 2011 Outline Science Data Systems (SDS) Space & Earth SDSs SDS Common Architecture Components Key Components using Cloud Computing Use Case 1:

More information

Introduction to Red Hat Storage. January, 2012

Introduction to Red Hat Storage. January, 2012 Introduction to Red Hat Storage January, 2012 1 Today s Speakers 2 Heather Wellington Tom Trainer Storage Program Marketing Manager Storage Product Marketing Manager Red Hat Acquisition of Gluster What

More information

Big + Fast + Safe + Simple = Lowest Technical Risk

Big + Fast + Safe + Simple = Lowest Technical Risk Big + Fast + Safe + Simple = Lowest Technical Risk The Synergy of Greenplum and Isilon Architecture in HP Environments Steffen Thuemmel (Isilon) Andreas Scherbaum (Greenplum) 1 Our problem 2 What is Big

More information

High Performance Science Cloud! Meeting the Big Data Challenges of Climate Science!

High Performance Science Cloud! Meeting the Big Data Challenges of Climate Science! ! High Performance Science Cloud! Meeting the Big Data Challenges of Climate Science!! Presentation at the 16 th Workshop on High Performance Computing in Meteorology!! Daniel Duffy 1, John Schnase 2,

More information

Data-centric Renovation of Scientific Workflow in the Age of Big Data

Data-centric Renovation of Scientific Workflow in the Age of Big Data Data-centric Renovation of Scientific Workflow in the Age of Big Data Ryong Lee, Ph. D. ryonglee@kisti.re.kr Dept. of Scientific Big Data Research Korea Institute of Science and Technology Information

More information

White Paper. Version 1.2 May 2015 RAID Incorporated

White Paper. Version 1.2 May 2015 RAID Incorporated White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively

More information

Big data from Space : esperienze in GEOSS e altre iniziative internazionali

Big data from Space : esperienze in GEOSS e altre iniziative internazionali Big data from Space : esperienze in GEOSS e altre iniziative internazionali PAOLO MAZZETTI, STEFANO NATIVI CNR IIA INSTITUTE OF ATMOSPHERIC POLLUTION RESEARCH, NATIONAL RESEARCH COUNCIL OF ITALY BIG DATA,

More information

Agile Retrieval of Big Data with. EarthServer. ECMWF Visualization Week, Reading, 2015-sep-29

Agile Retrieval of Big Data with. EarthServer. ECMWF Visualization Week, Reading, 2015-sep-29 Agile Retrieval of Big Data with EarthServer ECMWF Visualization Week, Reading, 2015-sep-29 Peter Baumann Jacobs University rasdaman GmbH baumann@rasdaman.com [co-funded by EU through EarthServer, PublicaMundi]

More information

Introduc8on to Apache Spark

Introduc8on to Apache Spark Introduc8on to Apache Spark Jordan Volz, Systems Engineer @ Cloudera 1 Analyzing Data on Large Data Sets Python, R, etc. are popular tools among data scien8sts/analysts, sta8s8cians, etc. Why are these

More information

HYCOM Meeting. Tallahassee, FL

HYCOM Meeting. Tallahassee, FL HYCOM Data Service An overview of the current status and new developments in Data management, software and hardware Ashwanth Srinivasan & Jon Callahan COAPS FSU & PMEL HYCOM Meeting Nov 7-9, 7 2006 Tallahassee,

More information

Preparing Scientists for Scalable Software Development

Preparing Scientists for Scalable Software Development Preparing Scientists for Scalable Software Development Valerie Maxville Education Program Leader About ivec About ivec Five members: Four public universities CSIRO Three facilities Education and Programs

More information

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Data-intensive HPC: opportunities and challenges. Patrick Valduriez Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,

More information

ABoVE Science Cloud: An orientation for the ABoVE Science Team

ABoVE Science Cloud: An orientation for the ABoVE Science Team ABoVE Science Cloud: An orientation for the ABoVE Science Team Peter Griffith Chief Support Scientist Liz Hoy Support Scientist Carbon Cycle & Ecosystems Office @NASA_ABoVE Mark McInerney Dan Duffy Data

More information

Data Storage At the Heart of any Information System. Ken Claffey, VP/GM - June 2015

Data Storage At the Heart of any Information System. Ken Claffey, VP/GM - June 2015 Data Storage At the Heart of any Information System Ken Claffey, VP/GM - June 2015 Seagate: A Unique Vantage Point on the Data Centre Evolution of the world s digital information End-to-end cloud solutions:

More information

EarthServer @ RDA. RDA: Co-chairing - Big Data IG - Geospatial IG. OGC co-chairing: - editor, Big Geo Data stds - BigData.DWG

EarthServer @ RDA. RDA: Co-chairing - Big Data IG - Geospatial IG. OGC co-chairing: - editor, Big Geo Data stds - BigData.DWG RDA: Co-chairing - Big Data IG - Geospatial IG OGC co-chairing: - editor, Big Geo Data stds - BigData.DWG EarthServer @ RDA RDA-DE/DINI Workshop, Karlsruhe, 2015-may-29 Peter Baumann Jacobs University

More information

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined

More information

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket

More information

«Shanoir : une solu/on pour la ges/on de données distribuées en imagerie in- vivo» Jus/ne Guillaumont Isabelle Corouge

«Shanoir : une solu/on pour la ges/on de données distribuées en imagerie in- vivo» Jus/ne Guillaumont Isabelle Corouge «Shanoir : une solu/on pour la ges/on de données distribuées en imagerie in- vivo» Jus/ne Guillaumont Isabelle Corouge Shanoir: a solu-on for neuro- imaging data management Jus/ne Guillaumont, Isabelle

More information

ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P.

ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P. ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P. Kushner, D. Waliser, S. Pascoe, A. Stephens, P. Kershaw,

More information

Part I Courses Syllabus

Part I Courses Syllabus Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment

More information

GEONETCast delivering environmental data to users worldwide (September 2007)

GEONETCast delivering environmental data to users worldwide (September 2007) GEONETCast delivering environmental data to users worldwide (September 2007) Lothar Wolf 1, Michael Williams 2 Abstract GEONETCast, a near real-time global, environmental information delivery system by

More information

BMW11: Dealing with the Massive Data Generated by Many-Core Systems. Dr Don Grice. 2011 IBM Corporation

BMW11: Dealing with the Massive Data Generated by Many-Core Systems. Dr Don Grice. 2011 IBM Corporation BMW11: Dealing with the Massive Data Generated by Many-Core Systems Dr Don Grice IBM Systems and Technology Group Title: Dealing with the Massive Data Generated by Many Core Systems. Abstract: Multi-core

More information

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti

More information

Global Terrestrial Network HYDROLOGY (GTN-H)

Global Terrestrial Network HYDROLOGY (GTN-H) Terrestrial Network HYDROLOGY (GTN-H) 1 GEO GTN - H GCOS IGWCO 2 Main Objectives Make available data from existing global hydrological observation networks and enhance their value through integration Generation

More information

Selected Ac)vi)es Overseen by the WDAC

Selected Ac)vi)es Overseen by the WDAC Selected Ac)vi)es Overseen by the WDAC Some background obs4mips ana4mips CREATE- IP (in planning stage) Contribu)ons from: M. Bosilovich, O. Brown, V. Eyring, R. Ferraro., P. Gleckler, R. Joseph, J. PoQer,

More information

Workshop on Exascale Data Management, Analysis, and Visualiza=on Houston TX 2/22/2011 Scott A. Klasky

Workshop on Exascale Data Management, Analysis, and Visualiza=on Houston TX 2/22/2011 Scott A. Klasky Workshop on Exascale Data Management, Analysis, and Visualiza=on Houston TX 2/22/2011 Scott A. Klasky klasky@ornl.gov ORNL: Q. Liu, J. Logan, N. Podhorszki, R. Tchoua Georgia Tech: H. Abbasi, G. Eisehnhauer,

More information

CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk

CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk CEDA Storage Dr Matt Pritchard Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk How we store our data NAS Technology Backup JASMIN/CEMS CEDA Storage Data stored as files on disk. Data is migrated

More information

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS data analysis José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS Cluster definition: A computer cluster is a group of linked computers, working

More information

PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D. PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD Natasha Balac, Ph.D. Brief History of SDSC 1985-1997: NSF national supercomputer center; managed by General Atomics

More information

Tim Killeen President Designate, University of Illinois April 29, 2015

Tim Killeen President Designate, University of Illinois April 29, 2015 Tim Killeen President Designate, University of Illinois April 29, 2015 Introduction Compelling requirements National and International Efforts Obstacles and Needs Professor of Atmospheric, Oceanic, and

More information

Adding Big Earth Data Analytics to GEOSS

Adding Big Earth Data Analytics to GEOSS Research funded through EU FP7 283610 EarthServer European Scalable Earth Science Service Environment Adding Big Earth Data Analytics to GEOSS GEO IX Plenary Foz do Iguacu, 2012-nov-20 Peter Baumann, Stefano

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Interaction with other IT projects: EUDAT2020, VLDATA, ENVRI PLUS,

Interaction with other IT projects: EUDAT2020, VLDATA, ENVRI PLUS, Interaction with other IT projects: EUDAT2020, VLDATA, ENVRI PLUS, A. Spinuso, L. Trani, A. Strollo and D. Bailo EPOS PP final meeting, Rome, 22-24 October 2014 OUTLINE WG1 and the EIDA use case A modular

More information

Introduction to DKRZ and to the Visualization Server halo

Introduction to DKRZ and to the Visualization Server halo Introduction to DKRZ and to the Visualization Server halo Michael Böttinger Niklas Röber Deutsches Klimarechenzentrum -------------------------------------------------------------- German Climate Computing

More information

320473 Databases & Web Applications Lab 320454 Big Data Project A

320473 Databases & Web Applications Lab 320454 Big Data Project A 320473 Databases & Web Applications Lab 320454 Big Data Project A Instructor: Peter Baumann email: p.baumann@jacobs-university.de tel: -3178 office: room 88, Research 1 320302 Databases & Web Applications

More information

Predictive Community Computational Tools for Virtual Plasma Science Experiments

Predictive Community Computational Tools for Virtual Plasma Science Experiments Predictive Community Computational Tools for Virtual Plasma Experiments J.-L. Vay, E. Esarey, A. Koniges Lawrence Berkeley National Laboratory J. Barnard, A. Friedman, D. Grote Lawrence Livermore National

More information

Extreme Computing. Big Data. Stratis Viglas. School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk. Stratis Viglas Extreme Computing 1

Extreme Computing. Big Data. Stratis Viglas. School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk. Stratis Viglas Extreme Computing 1 Extreme Computing Big Data Stratis Viglas School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk Stratis Viglas Extreme Computing 1 Petabyte Age Big Data Challenges Stratis Viglas Extreme Computing

More information