It s not just about big data for the Earth and Environmental Sciences: it s now about High Performance Data (HPD)

Size: px
Start display at page:

Download "It s not just about big data for the Earth and Environmental Sciences: it s now about High Performance Data (HPD)"

Transcription

1 It s not just about big data for the Earth and Environmental Sciences: it s now about High Performance Data (HPD) Lesley Wyborn Geoscience Australia New Petascale Raijin Computer at NCI

2 Outline of the Big Data Problem in Earth and Environmental Sciences We know we have a Big Data problem But have we nailed what the Big Data problem is? Until we do, we could waste a lot of resources This presentation is about trying to nail what the Big Data problem is for the Earth and Environmental Sciences And showing exemplars of how we are addressing it

3 My take is that Big Data is not just about the V s 1. Volume: data at rest 2. Velocity: data in motion (streaming) 3. Variety: many types, forms and structures or no structures 4. Veracity: trustworthiness, provenance, lineage, quality 5. Validity: data that is correct 6. Visualization: data in patterns 7. Vulnerability: data at risk 8. Value: data that is meaningful 9. V????? 10. V?????

4 Big data affects all stages of the Earth and Environmental Scientific Workflow Model, Simulate & Analyse 2/3/4D Integrate 2/3/4D Deliver Acquire Slide courtesy of Bruce Kilgour! Geoscience Australia Store & Manage

5 But why is the Big Data Problem so Big for Earth and Environmental sciences??? Earth and Environmental Sciences were actually early adopters of computation and are they now locked into old technologies??? Although there are PB s of data, it is locked into in small file sizes Is this the 32 bit legacy of limit of 2 GB files sizes??? Files sizes often at 1, 2, or 4.71 GB)??? Earth and environmental sciences are also plagued by the long tail problem

6 Environmental and Earth Sciences do have high proportions of Long Tail Data The Head:! Astronomy, Climate,! High Energy Physics, Genomics The Long Tail:! Environmental and! Earth sciences Long Tail Characteristics More specialised Low volume On C drives Hard to find Heterogeneous Collected by large numbers of people Citizen science Etc Etc

7 The Advanced ICT Tetrahedron in balance High Performance Computing Content (Data, Information Knowledge) Tools Bandwidth

8 The Advanced ICT Tetrahedron in 2013 High Performance Computing Bandwidth Content: Data, Information, Knowledge Tools, Codes

9 Evolution of Peak Facilities at NCI/APAC System' (Top500'rank)' Procs/' Cores' Memory' Disk' Peak'Perf.' (Tflops)' Sustained' Perf.'(SPEC)' & Compaq&Alphaserver&(31)&& 512& 0.5&Tbyte& 12&Tbytes& 1&TFlop& 2,000& & SGI&AlCx&3700&(26)& 1920& 5.5&Tbytes& 30&(+70)& Tbytes& 14&Tflops& 21,000& 2008 &12& SGI&AlCx&XE&(L)& & Sun&ConstellaCon&(35)& 2013& && Fujitsu&Petascale&System& 1248& 2.5&Tbytes& 90&Tbytes& 14&TFlops& 12,000& 11,936& 37&Tbytes& 800&Tbytes& 140&TFlops& 251,000& 57,472& 160&Tbytes& 10&Pbytes& 1200&Tflops& 1,600,000& Botten, Evans, CSIRO CSS, 22 March, 2012

10 We need to capitalise on DIISRTE investments in eresearch Infrastructure, in particular the 2 Petascale computers (NCI, Pawsey) and the NeCTAR Cloud Raijin 4000 ksu Vayu Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q GA Share Request Usage Graph of usage of GA share since Q4 2010

11 ! External Australian HPC in Top 500: June 2013 Petascale:! >100,000 cores Terascale:! >10,000 cores!! No 27 No 1: PFLOPS No 27: NCI (979 TFlops No 39: LS Vic (715 TFlops)! No 289: CSIRO (133 TFlops)! No 320: NCI Vayu (126 TFlops)! No 460: Defence (102 Tflops) No 10: 2.90 PFLOPS Tier 0! (Top 10)! Tier 1! (Top 500)! No 500 (96.62 TFLOPS) Internal Gigascale:! >1,000 cores! Megascale:! >100 cores Desktop:! 2 8 cores Institutional Facilities Local Machines and Local Machines and Clusters Clusters Grid,! Cloud Local Local Condor! Condor! Pools GA usage!! Based on European Climate Computing Environments, Bryan Lawrence ( ) and Top 500 list November 2011 ( Tier 2 Tier 3

12 Given GA has 4 PB s of data, what behavioural characteristics do camels and GA have in common? The Camel Geoscience Australia attachauth=anoy7cr529za7fym8iwibd5ifg7yjo_mjukmyhuibimygbgxg1ajwn4wdpn39znjuokvdbf2-ntpp9gkcrpsk- eppm2rqqlrowgp0khxdcbvejytd5sdxkjpatb-6stgoat6kqtdp3t32jjmjjnvz42aojx2r5ksgozw0p2-wwl5iixzsktqxxbc1alg1clu6jsl0iz75fvtuvs8fznw5fpodhbeg- _S_UJRlYwpr3AnTShEE1Y_h2r5Ec-aHRJ1kesURmDbo7MB&attredirects=0

13 Getting 4 PB of data out through a 100 Mb/s link is like getting a camel through the eye of the needle Which has exacerbated the 2GB limit..

14 The real meaning of Big Data It is not about increasing bandwidth or having/ distributing data into smaller packets (where do you store it?) It is about bringing the people, the tools and the compute to the data

15 Increase Model Complexity Monte Carlo Simulations, ensemble runs Increase Model Size Single passes at larger scales: more ensemble members Local Giga Timescale Use longer duration runs: use more and shorter time intervals Terascale Increase Data Resolution Use higher resolution data Petascale Speed up data access Self describing data cubes and data arrays Based on European Climate Computing Environments, Bryan Lawrence ( )

16 The data aggregation problem in climate 6 th assessment th assessment th assessment rd assessment 2001 Slide Courtesy of Andy Pitman! COE Climate System Science

17 /11/03 14/01/04 04/03/04 23/04/04 12/06/04 01/08/04 20/09/04 09/11/04 29/12/04 17/02/05 We now emphasise Big Data vs High Performance Data (HPD) Big Data as is Everyone else HPD in the Future Dam Inundation Remote Sensing specialists Dam Inundation (%) Time Discovery and delivery layer (Authentication, billing etc) Raw observations Scenes Pixels Process to scenes Process to standardised nested grid of pixels

18 Seasonal changes in Lake Disappointment, WA: July 1999 to January 2000: traditional approach scene by scene

19 But to extract this information EO product process Search catalog order scenes Identify footprint of product in space or time Client requests product 1Petabyte hierarchical archive: Millions of individual scenes in a Tape store that is accessed by robot. Orthorectification calibration, cloud Masking, atmospheric correction, mosaicing Feature extraction, algorithm application spectral unmixing Product packaging and delivery

20 Cubing Landsat images Tile squares Landsat images! time! Dice &

21 Menindee Lakes: Surface water Menindee Lakes time series: Total observations per grid cell: ~ *4000 grid cells scenes (58 years to retrieve data) 91TB of netcdf data files on spinning disk

22 The Aster HPD Array: Facilitating Online Data Analysis Seamless coverage of 3500 scenes each 60km*60km Selected from an archive of 35,000 scenes Available at national and local scales on the AuScope portal 17 layers * 60 GB = 1.2 TB

23 The bonsai effect: degrading our data

24 We don t degrade photographic images so why do we do this to our science?

25 High Performance Point Data Sets

26 Resolution impacts on file size: eg Magnetics Version Year Grid cell size Data file size m 0.49 GB m 0.94 GB m 9.73 GB (?) <80m 3 TB (Slide courtesy of Murray Richardson )

27 The fundamentals of Big Earth & Environmental Data: a common coordinate reference system

28 Put simply: we know the earth is not flat.

29 Put simply: but our maps are still flat.

30 A B C 120 E 130 E 140 E 20 S 25 S 30 S AusMoho map from Kennett et al. (2011) A D E F D B E C F Moho from CRUST2.0

31 rhealpix: A discrete global grid system HEALPix = Hierarchical Equal Area isolatitudinal Pixelisation of a sphere rhealpix = Hierarchical Equal Area isolatitudinal Pixelisation on an ellipsoid of revolution

32 Introducing The Virtual Geophysics Laboratory

33 Before VGL The workflow 1. Select dataset and download GADDS 2. Process data and grid Intrepid 3. Image Processing and reprojection ERMapper 4. Export data as csv and add uncertainty using matlab 5. Write ubc-gif or escript.downunder script files 6. Transfer data and files to the NCI 7. Wait 8. Download results 9. Import into GOCAD for viewing No less than 6 different tools or applications No Provenance recorded. Provenance and VGL

34 The Computational Science Workflow Data + Methods + Resources

35 The Virtual Laboratory Jigsaw

36 Data discovery Layers discovered via remote registries Layers consist of numerous remote data services

37 Data processing Flexibility in what computing resources to utilise A variety of different scientific codes are already available in the form of Toolboxes

38 Data processing Input files are passed directly into the cloud Further input files can be uploaded.

39 Data processing The steps so far have been building Either an write environment your own... to run a processing script...or build from existing templates

40 Managing results - provenance A job s console log can be inspected All of a job s outputs are also accessible Each job has a lifecycle that can be managed PresentaCon&Ctle&& &&Presenter&name&

41 Managing results - provenance Each provenance record tracks all inputs, outputs, processing scripts and other metadata... Spatial bounds... Input/output data... Successful jobs can have their entire process captured in a ISO provenance record

42

43 Components of a Virtual Laboratory Data' Services' Processing' Services' Compute' Services' Enablers' (eg.'ogc' Glue )' Virtual' Laboratory' CSIRO Earth Observation Informatics TCP Workshop, April 2013

44 Components of the Virtual Geophysics Laboratory Data' Services' Processing' Services' Compute' Services' Enablers' (eg.'ogc' Glue )' Dynamic'Virtual' Geophysics' Laboratories' MagneKcs' Gravity' DEM' escript' Under world' NCI' Petascale' NCI' Cloud' NeCTAR' Cloud' Amazon' Cloud' Service' OrchestraKon' VGL' Portal' Provenance' Metadata' ScripKng' Tool' NCI' Cloud' escript' Mag.'Grav.' VGL' Portal' Mag.'Grav.' NCI' Petascale' VGL' Portal' DEM' NCI' Cloud' Under world' Desktop' CSIRO Earth Observation Informatics TCP Workshop, April 2013

45 Repurposing to a Virtual Hazards Laboratory Data' Services' MagneKcs' Gravity' DEM' Landsat' Processing' Services' Unchanged' ANUGA' EQRM' Compute' Services' NCI' Petascale' NCI' Cloud' NeCTAR' Cloud' Amazon' Cloud' Enablers' (eg.'ogc' Glue )' Service' OrchestraKon' VGL' Portal' Provenance' Metadata' ScripKng' Tool' Dynamic'Virtual' Hazards' Laboratories' NCI' Mag.'Grav.' Petascale' VGL' Portal' EQRM' DEM' NCI' Bathy'DEM' Cloud' VGL' Portal' Amazon' ANUGA' Cloud' Bathymetry' Desktop' CSIRO Earth Observation Informatics TCP Workshop, April 2013

46 Repurposing to a Virtual Environmental Laboratory Data' Services' Climate' Records' Species' DEM' Landsat' Processing' Services' Unchanged' Wind' Modelling' Land'Use' AnalyKcs' Compute' Services' NCI' Petascale' NCI' Cloud' NeCTAR' Cloud' Amazon' Cloud' Enablers' (eg.'ogc' Glue )' Service' OrchestraKon' VGL' Portal' Provenance' Metadata' ScripKng' Tool' Dynamic'Virtual' Environmental' Laboratories' Amazon' Sat.' Species' Cloud' VGL' Portal' Bug' tracking' DEM' NCI' Weather'DEM' HPC' VGL' Portal' Amazon' Tsunami' Cloud' Bathymetry' Desktop' CSIRO Earth Observation Informatics TCP Workshop, April 2013

47 Any Questions? Phone: Web: lesley.wyborn@ga.gov.au Address: Cnr Jerrabomberra Avenue and Hindmarsh Drive, Symonston ACT 2609 Postal Address: GPO Box 378, Canberra ACT 2601

High Performance Compu2ng and High Performance Data: exploring the growing use of Supercomputers in Oil and Gas Explora2on & Produc2on

High Performance Compu2ng and High Performance Data: exploring the growing use of Supercomputers in Oil and Gas Explora2on & Produc2on High Performance Compu2ng and High Performance Data: exploring the growing use of Supercomputers in Oil and Gas Explora2on & Produc2on Lesley Wyborn1, Ben Evans1, David Lescinsky2 and Clinton Foster2 16

More information

Natural Hazard Risk Assessment in the Australasian Region: Informing Disaster Risk Reduction and Building Community Resilience.

Natural Hazard Risk Assessment in the Australasian Region: Informing Disaster Risk Reduction and Building Community Resilience. Natural Hazard Risk Assessment in the Australasian Region: Informing Disaster Risk Reduction and Building Community Resilience Jane Sexton Australia and Disaster Risk Reduction Australian Context Understanding

More information

Perspec'ves on Big Data in the Geosciences from a major Australian na'onal data center

Perspec'ves on Big Data in the Geosciences from a major Australian na'onal data center Perspec'ves on Big Data in the Geosciences fro a ajor Australian na'onal data center Lesley Wyborn @NCInews The Data Tsunai has not yet landed The 43 PB on the Research Data Storage Infrastructure 10 PBytes

More information

Driving Earth Systems Collaboration across the Pacific

Driving Earth Systems Collaboration across the Pacific Driving Earth Systems Collaboration across the Pacific Tim F. Pugh Centre for Australian Weather and Climate Research Australian Bureau of Meteorology What s our business? Earth System Science Study of

More information

Cloud Computing @ JPL Science Data Systems

Cloud Computing @ JPL Science Data Systems Cloud Computing @ JPL Science Data Systems Emily Law, GSAW 2011 Outline Science Data Systems (SDS) Space & Earth SDSs SDS Common Architecture Components Key Components using Cloud Computing Use Case 1:

More information

Research Collaboration in the Cloud: - the NeCTAR Research Cloud

Research Collaboration in the Cloud: - the NeCTAR Research Cloud Research Collaboration in the Cloud: - the NeCTAR Research Cloud National eresearch Collaboration Tools and Resources nectar.org.au NeCTAR is an initiative of the Australian Government being conducted

More information

SURFsara Data Services

SURFsara Data Services SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Mark van de Sanden The world of the many Many different users (well organised (international) user communities, research groups, universities,

More information

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS)

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) Jessica Chapman, Data Workshop March 2013 ASKAP Science Data Archive Talk outline Data flow in brief Some radio

More information

Outcomes of the CDS Technical Infrastructure Workshop

Outcomes of the CDS Technical Infrastructure Workshop Outcomes of the CDS Technical Infrastructure Workshop Baudouin Raoult Baudouin.raoult@ecmwf.int Funded by the European Union Implemented by Evaluation & QC function C3S architecture from European commission

More information

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Outline Introduction EO challenges; EO and classical/cloud computing; EO Services The computing platform Cluster -> Grid -> Cloud

More information

National eresearch Collaboration Tools and Resources nectar.org.au

National eresearch Collaboration Tools and Resources nectar.org.au National eresearch Collaboration Tools and Resources nectar.org.au NeCTAR is an Australian Government project conducted as part of the Super Science initiative and financed by the Education Investment

More information

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Beth Plale Indiana University plale@cs.indiana.edu LEAD TR 001, V3.0 V3.0 dated January 24, 2007 V2.0 dated August

More information

Enabling Science in the Cloud: A Remote Sensing Data Processing Service for Environmental Science Analysis

Enabling Science in the Cloud: A Remote Sensing Data Processing Service for Environmental Science Analysis Enabling Science in the Cloud: A Remote Sensing Data Processing Service for Environmental Science Analysis Catharine van Ingen 1, Jie Li 2, Youngryel Ryu 3, Marty Humphrey 2, Deb Agarwal 4, Keith Jackson

More information

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University

More information

NASA's Strategy and Activities in Server Side Analytics

NASA's Strategy and Activities in Server Side Analytics NASA's Strategy and Activities in Server Side Analytics Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at the ESGF/UVCDAT Conference Lawrence Livermore National Laboratory

More information

Information and Communications Technology Strategy 2014-2017

Information and Communications Technology Strategy 2014-2017 Contents 1 Background ICT in Geoscience Australia... 2 1.1 Introduction... 2 1.2 Purpose... 2 1.3 Geoscience Australia and the Role of ICT... 2 1.4 Stakeholders... 4 2 Strategic drivers, vision and principles...

More information

Concepts and Architecture of Grid Computing. Advanced Topics Spring 2008 Prof. Robert van Engelen

Concepts and Architecture of Grid Computing. Advanced Topics Spring 2008 Prof. Robert van Engelen Concepts and Architecture of Grid Computing Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Grid users: who are they? Concept of the Grid Challenges for the Grid Evolution of Grid systems

More information

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training

More information

Data Requirements from NERSC Requirements Reviews

Data Requirements from NERSC Requirements Reviews Data Requirements from NERSC Requirements Reviews Richard Gerber and Katherine Yelick Lawrence Berkeley National Laboratory Summary Department of Energy Scientists represented by the NERSC user community

More information

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop SURFsara HPC Cloud Workshop doc.hpccloud.surfsara.nl UvA workshop 2016-01-25 UvA HPC Course Jan 2016 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current

More information

Big Data Volume & velocity data management with ERDAS APOLLO. Alain Kabamba Hexagon Geospatial

Big Data Volume & velocity data management with ERDAS APOLLO. Alain Kabamba Hexagon Geospatial Big Data Volume & velocity data management with ERDAS APOLLO Alain Kabamba Hexagon Geospatial Intergraph is Part of the Hexagon Family Hexagon is dedicated to delivering actionable information through

More information

Big Data Infrastructures for Processing Sentinel Data

Big Data Infrastructures for Processing Sentinel Data Big Data Infrastructures for Processing Sentinel Data Wolfgang Wagner Department for Geodesy and Geoinformation Technische Universität Wien Earth Observation Data Centre for Water Resources Monitoring

More information

JASMIN Cloud ESGF and UV- CDAT Conference 09-11 December 2014 STFC / Stephen Kill

JASMIN Cloud ESGF and UV- CDAT Conference 09-11 December 2014 STFC / Stephen Kill JASMIN Cloud ESGF and UV- CDAT Conference 09-11 December 2014 STFC / Stephen Kill Philip Kershaw (1, 2), Jonathan Churchill (5), Bryan Lawrence (1, 3, 4), Stephen Pascoe (1, 4) and MaE Pritchard (1) Centre

More information

Digital Preservation Lifecycle Management

Digital Preservation Lifecycle Management Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar San Diego Supercomputer Center, University of California,

More information

Big Data Services at DKRZ

Big Data Services at DKRZ Big Data Services at DKRZ Michael Lautenschlager and Colleagues from DKRZ and Scientific Computing Research Group MPI-M Seminar Hamburg, March 31st, 2015 Big Data in Climate Research Big data is an all-encompassing

More information

Globus Research Data Management: Introduction and Service Overview. Steve Tuecke Vas Vasiliadis

Globus Research Data Management: Introduction and Service Overview. Steve Tuecke Vas Vasiliadis Globus Research Data Management: Introduction and Service Overview Steve Tuecke Vas Vasiliadis Presentations and other useful information available at globus.org/events/xsede15/tutorial 2 Thank you to

More information

The USGS Landsat Big Data Challenge

The USGS Landsat Big Data Challenge The USGS Landsat Big Data Challenge Brian Sauer Engineering and Development USGS EROS bsauer@usgs.gov U.S. Department of the Interior U.S. Geological Survey USGS EROS and Landsat 2 Data Utility and Exploitation

More information

Remote Sensing Method in Implementing REDD+

Remote Sensing Method in Implementing REDD+ Remote Sensing Method in Implementing REDD+ FRIM-FFPRI Research on Development of Carbon Monitoring Methodology for REDD+ in Malaysia Remote Sensing Component Mohd Azahari Faidi, Hamdan Omar, Khali Aziz

More information

Big Data and Cloud Computing for GHRSST

Big Data and Cloud Computing for GHRSST Big Data and Cloud Computing for GHRSST Jean-Francois Piollé (jfpiolle@ifremer.fr) Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer Facing data deluge

More information

The ORIENTGATE data platform

The ORIENTGATE data platform Seminar on Proposed and Revised set of indicators June 4-5, 2014 - Belgrade (Serbia) The ORIENTGATE data platform WP2, Action 2.4 Alessandra Nuzzo, Sandro Fiore, Giovanni Aloisio Scientific Computing and

More information

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Data-intensive HPC: opportunities and challenges. Patrick Valduriez Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,

More information

NetCDF and HDF Data in ArcGIS

NetCDF and HDF Data in ArcGIS 2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop NetCDF and HDF Data in ArcGIS Nawajish Noman Kevin Butler Esri UC2013. Technical Workshop. Outline NetCDF

More information

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013 Virtualisation Cloud Computing at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013 Virtualisation @ RAL Context at RAL Hyper-V Services Platform Scientific Computing Department

More information

Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS

Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS Workshop on the Future of Big Data Management 27-28 June 2013 Philip Kershaw Centre for Environmental Data Archival

More information

The Mantid Project. The challenges of delivering flexible HPC for novice end users. Nicholas Draper SOS18

The Mantid Project. The challenges of delivering flexible HPC for novice end users. Nicholas Draper SOS18 The Mantid Project The challenges of delivering flexible HPC for novice end users Nicholas Draper SOS18 What Is Mantid A framework that supports high-performance computing and visualisation of scientific

More information

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop SURFsara HPC Cloud Workshop www.cloud.sara.nl Tutorial 2014-06-11 UvA HPC and Big Data Course June 2014 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current

More information

Best Practices for Data Management. RMACC HPC Symposium, 8/13/2014

Best Practices for Data Management. RMACC HPC Symposium, 8/13/2014 Best Practices for Data Management RMACC HPC Symposium, 8/13/2014 Presenters Andrew Johnson Research Data Librarian CU-Boulder Libraries Shelley Knuth Research Data Specialist CU-Boulder Research Computing

More information

Cloud-based Infrastructures. Serving INSPIRE needs

Cloud-based Infrastructures. Serving INSPIRE needs Cloud-based Infrastructures Serving INSPIRE needs INSPIRE Conference 2014 Workshop Sessions Benoit BAURENS, AKKA Technologies (F) Claudio LUCCHESE, CNR (I) June 16th, 2014 This content by the InGeoCloudS

More information

MSDI: Workflows, Software and Related Data Standards

MSDI: Workflows, Software and Related Data Standards MSDI: Workflows, Software and Related Data Standards By Andy Hoggarth October 2009 Introduction Leveraging SDI principles for hydrographic operational efficiency French INFRAGEOS example (SHOM - Service

More information

A Service for Data-Intensive Computations on Virtual Clusters

A Service for Data-Intensive Computations on Virtual Clusters A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent

More information

Advanced Image Management using the Mosaic Dataset

Advanced Image Management using the Mosaic Dataset Esri International User Conference San Diego, California Technical Workshops July 25, 2012 Advanced Image Management using the Mosaic Dataset Vinay Viswambharan, Mike Muller Agenda ArcGIS Image Management

More information

Unterstützung datenintensiver Forschung am KIT Aktivitäten, Dienste und Erfahrungen

Unterstützung datenintensiver Forschung am KIT Aktivitäten, Dienste und Erfahrungen Unterstützung datenintensiver Forschung am KIT Aktivitäten, Dienste und Erfahrungen Achim Streit Steinbuch Centre for Computing (SCC) KIT Universität des Landes Baden-Württemberg und nationales Forschungszentrum

More information

Digital Earth: Big Data, Heritage and Social Science

Digital Earth: Big Data, Heritage and Social Science Digital Earth: Big Data, Heritage and Social Science The impact on geographic information and GIS Geographic Information Systems Analysis for Decision Support Impact of Big Data Digital Earth Citizen Engagement

More information

BLACKBRIDGE SATELLITE IMAGERY THROUGH CLOUD COMPUTING

BLACKBRIDGE SATELLITE IMAGERY THROUGH CLOUD COMPUTING BLACKBRIDGE SATELLITE IMAGERY THROUGH CLOUD COMPUTING Jason Setzer Cloud Product Manager Slide 1 THE RAPID EYE CONSTELLATION 5 Identical Satellites in same obit Up to 5 million km² collected daily 1 billion

More information

Enhanced Research Data Management and Publication with Globus

Enhanced Research Data Management and Publication with Globus Enhanced Research Data Management and Publication with Globus Vas Vasiliadis Jim Pruyne Presented at OR2015 June 8, 2015 Presentations and other useful information available at globus.org/events/or2015/tutorial

More information

PART 1. Representations of atmospheric phenomena

PART 1. Representations of atmospheric phenomena PART 1 Representations of atmospheric phenomena Atmospheric data meet all of the criteria for big data : they are large (high volume), generated or captured frequently (high velocity), and represent a

More information

Scalable stochastic tracing of distributed data management events

Scalable stochastic tracing of distributed data management events Scalable stochastic tracing of distributed data management events Mario Lassnig mario.lassnig@cern.ch ATLAS Data Processing CERN Physics Department Distributed and Parallel Systems University of Innsbruck

More information

Introduction to Imagery and Raster Data in ArcGIS

Introduction to Imagery and Raster Data in ArcGIS Esri International User Conference San Diego, California Technical Workshops July 25, 2012 Introduction to Imagery and Raster Data in ArcGIS Simon Woo slides Cody Benkelman - demos Overview of Presentation

More information

Data Management Framework for the North American Carbon Program

Data Management Framework for the North American Carbon Program Data Management Framework for the North American Carbon Program Bob Cook, Peter Thornton, and the Steering Committee Image courtesy of NASA/GSFC NACP Data Management Planning Workshop New Orleans, LA January

More information

Data Management and Analysis in Support of DOE Climate Science

Data Management and Analysis in Support of DOE Climate Science Data Management and Analysis in Support of DOE Climate Science August 7 th, 2013 Dean Williams, Galen Shipman Presented to: Processing and Analysis of Very Large Data Sets Workshop The Climate Data Challenge

More information

Of the above, the seven marked in bold are a particular focus for this Status Report.

Of the above, the seven marked in bold are a particular focus for this Status Report. Status Report on the National eresearch Capability The Australian Department of Education is developing a report on the status of national eresearch infrastructure in Australia, funded under the National

More information

Improved metrics collection and correlation for the CERN cloud storage test framework

Improved metrics collection and correlation for the CERN cloud storage test framework Improved metrics collection and correlation for the CERN cloud storage test framework September 2013 Author: Carolina Lindqvist Supervisors: Maitane Zotes Seppo Heikkila CERN openlab Summer Student Report

More information

Hue Streams. Seismic Compression Technology. Years of my life were wasted waiting for data loading and copying

Hue Streams. Seismic Compression Technology. Years of my life were wasted waiting for data loading and copying Hue Streams Seismic Compression Technology Hue Streams real-time seismic compression results in a massive reduction in storage utilization and significant time savings for all seismic-consuming workflows.

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

Data Mining with Hadoop at TACC

Data Mining with Hadoop at TACC Data Mining with Hadoop at TACC Weijia Xu Data Mining & Statistics Data Mining & Statistics Group Main activities Research and Development Developing new data mining and analysis solutions for practical

More information

SRS BIO OPTICAL WORKFLOW

SRS BIO OPTICAL WORKFLOW SRS BIO OPTICAL WORKFLOW Version 2.0 22 nd March 2013 Data Workflows emii, the data management facility for IMOS, has developed workflows for each IMOS sub facility to describe the flow of IMOS data from

More information

Optimizing IT Deployment Issues

Optimizing IT Deployment Issues Optimizing IT Deployment Issues Trends and Challenges for Engineering Simulation Barbara Hutchings barbara.hutchings@ansys.com 1 Outline Deployment Challenges and Trends Extreme scale up and scale out

More information

Cloud-based Geospatial Data services and analysis

Cloud-based Geospatial Data services and analysis Cloud-based Geospatial Data services and analysis Xuezhi Wang Scientific Data Center Computer Network Information Center Chinese Academy of Sciences 2014-08-25 Outlines 1 Introduction of Geospatial Data

More information

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015 (Possible) HEP Use Case for NDN Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015 Outline LHC Experiments LHC Computing Models CMS Data Federation & AAA Evolving Computing Models & NDN Summary Phil DeMar:

More information

Experiences and challenges in the development of the JASMIN cloud service for the environmental science community

Experiences and challenges in the development of the JASMIN cloud service for the environmental science community JASMIN (STFC/Stephen Kill) Experiences and challenges in the development of the JASMIN cloud service for the environmental science community ECMWF Visualisa-on in Meteorology Week, 28 September 2015 Philip

More information

Paul Brebner, Senior Researcher, NICTA, Paul.Brebner@nicta.com.au

Paul Brebner, Senior Researcher, NICTA, Paul.Brebner@nicta.com.au Is your Cloud Elastic Enough? Part 2 Paul Brebner, Senior Researcher, NICTA, Paul.Brebner@nicta.com.au Paul Brebner is a senior researcher in the e-government project at National ICT Australia (NICTA,

More information

Data-Intensive Science and Scientific Data Infrastructure

Data-Intensive Science and Scientific Data Infrastructure Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

I Newsletter Q1 WELCOME TO OUR FIRST NEWSLETTER EDITION FOR 2015!

I Newsletter Q1 WELCOME TO OUR FIRST NEWSLETTER EDITION FOR 2015! I Newsletter Q1 WELCOME TO OUR FIRST NEWSLETTER EDITION FOR 2015! It s been a while since our last newsletter, but don t worry -- it s not you, it s us! We ve had our heads down hard at work speeding up

More information

How can we make the most of magnetic data in building regional geological models?

How can we make the most of magnetic data in building regional geological models? How can we make the most of magnetic data in building regional geological models? Clive Foss, Tony Meixner (Geoscience Australia) and James Austin MINERALS DOWN UNDER FLAGSHIP Excellent national magnetic

More information

CLOUD BASED N-DIMENSIONAL WEATHER FORECAST VISUALIZATION TOOL WITH IMAGE ANALYSIS CAPABILITIES

CLOUD BASED N-DIMENSIONAL WEATHER FORECAST VISUALIZATION TOOL WITH IMAGE ANALYSIS CAPABILITIES CLOUD BASED N-DIMENSIONAL WEATHER FORECAST VISUALIZATION TOOL WITH IMAGE ANALYSIS CAPABILITIES M. Laka-Iñurrategi a, I. Alberdi a, K. Alonso b, M. Quartulli a a Vicomteh-IK4, Mikeletegi pasealekua 57,

More information

Cloud and Big Data Standardisation

Cloud and Big Data Standardisation Cloud and Big Data Standardisation EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University of Amsterdam

More information

Flood Modelling for Cities using Cloud Computing FINAL REPORT. Vassilis Glenis, Vedrana Kutija, Stephen McGough, Simon Woodman, Chris Kilsby

Flood Modelling for Cities using Cloud Computing FINAL REPORT. Vassilis Glenis, Vedrana Kutija, Stephen McGough, Simon Woodman, Chris Kilsby Summary Flood Modelling for Cities using Cloud Computing FINAL REPORT Vassilis Glenis, Vedrana Kutija, Stephen McGough, Simon Woodman, Chris Kilsby Assessment of pluvial flood risk is particularly difficult

More information

Research Data Collection Data Management Plan

Research Data Collection Data Management Plan Research Data Collection Data Management Plan This document records the whole process of the data collections, including pre- preparation data,

More information

HPC technology and future architecture

HPC technology and future architecture HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr

More information

Metadata Hierarchy in Integrated Geoscientific Database for Regional Mineral Prospecting

Metadata Hierarchy in Integrated Geoscientific Database for Regional Mineral Prospecting Metadata Hierarchy in Integrated Geoscientific Database for Regional Mineral Prospecting MA Xiaogang WANG Xinqing WU Chonglong JU Feng ABSTRACT: One of the core developments in geomathematics in now days

More information

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult

More information

Interactive Data Visualization with Focus on Climate Research

Interactive Data Visualization with Focus on Climate Research Interactive Data Visualization with Focus on Climate Research Michael Böttinger German Climate Computing Center (DKRZ) 1 Agenda Visualization in HPC Environments Climate System, Climate Models and Climate

More information

XSEDE Data Analytics Use Cases

XSEDE Data Analytics Use Cases XSEDE Data Analytics Use Cases 14th Jun 2013 Version 0.3 XSEDE Data Analytics Use Cases Page 1 Table of Contents A. Document History B. Document Scope C. Data Analytics Use Cases XSEDE Data Analytics Use

More information

Data Analytics at NICTA. Stephen Hardy National ICT Australia (NICTA) shardy@nicta.com.au

Data Analytics at NICTA. Stephen Hardy National ICT Australia (NICTA) shardy@nicta.com.au Data Analytics at NICTA Stephen Hardy National ICT Australia (NICTA) shardy@nicta.com.au NICTA Copyright 2013 Outline Big data = science! Data analytics at NICTA Discrete Finite Infinite Machine Learning

More information

How To Write An Nccwsc/Csc Data Management Plan

How To Write An Nccwsc/Csc Data Management Plan Guidance and Requirements for NCCWSC/CSC Plans (Required for NCCWSC and CSC Proposals and Funded Projects) Prepared by the CSC/NCCWSC Working Group Emily Fort, Data and IT Manager for the National Climate

More information

Managing Bathymetry in the Cloud with GIS

Managing Bathymetry in the Cloud with GIS Esri Maritime Professional Services GEBCO Science Day Tuesday, October 4, 2011 Managing Bathymetry in the Cloud with GIS Timothy Kearns & Beata Van Esch Overview The challenges of bathymetry in GIS - What

More information

Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security

More information

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined

More information

GIS Initiative: Developing an atmospheric data model for GIS. Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP)

GIS Initiative: Developing an atmospheric data model for GIS. Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP) GIS Initiative: Developing an atmospheric data model for GIS Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP) Unidata seminar August 30, 2004 Presentation Outline Overview

More information

Scalable Services for Digital Preservation

Scalable Services for Digital Preservation Scalable Services for Digital Preservation A Perspective on Cloud Computing Rainer Schmidt, Christian Sadilek, and Ross King Digital Preservation (DP) Providing long-term access to growing collections

More information

Obtaining and Processing MODIS Data

Obtaining and Processing MODIS Data Obtaining and Processing MODIS Data MODIS is an extensive program using sensors on two satellites that each provide complete daily coverage of the earth. The data have a variety of resolutions; spectral,

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

11-12 June 2015, Bari-Italy. Stefano Nativi CNR-IIA

11-12 June 2015, Bari-Italy. Stefano Nativi CNR-IIA 11-12 June 2015, Bari-Italy Stefano Nativi CNR-IIA Coordinating an Observation Network of Networks EnCompassing satellite and IN-situ to fill the Gaps in European Observations GEOSS Information System

More information

Digital Asset Management 数 字 媒 体 资 源 管 理 任 课 老 师 : 张 宏 鑫 2015-09-15

Digital Asset Management 数 字 媒 体 资 源 管 理 任 课 老 师 : 张 宏 鑫 2015-09-15 Digital Asset Management 数 字 媒 体 资 源 管 理 任 课 老 师 : 张 宏 鑫 2015-09-15 1. Introduction 1. 导 论 Outline Outline Content management Outline Content management Industrial Analysis Outline Content management Industrial

More information

Managing Imagery and Raster Data in ArcGIS

Managing Imagery and Raster Data in ArcGIS Technical Workshops Managing Imagery and Raster Data in ArcGIS Hong Xu, Sangeet Mathew, Mark Harris Presentation Overview ArcGIS raster data models Which model to use Mosaic dataset storage and properties

More information

Landsat Monitoring our Earth s Condition for over 40 years

Landsat Monitoring our Earth s Condition for over 40 years Landsat Monitoring our Earth s Condition for over 40 years Thomas Cecere Land Remote Sensing Program USGS ISPRS:Earth Observing Data and Tools for Health Studies Arlington, VA August 28, 2013 U.S. Department

More information

How To Use Data From Copernicus And Big Data To Help The Environment

How To Use Data From Copernicus And Big Data To Help The Environment Copernicus and Big Data: Challenges and Opportunities Alessandro Annoni European Commission Joint Research Centre www.jrc.ec.europa.eu Serving society Stimulating innovation Supporting legislation Big

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

Files Used in this Tutorial

Files Used in this Tutorial Generate Point Clouds Tutorial This tutorial shows how to generate point clouds from IKONOS satellite stereo imagery. You will view the point clouds in the ENVI LiDAR Viewer. The estimated time to complete

More information

Intro to Data Management. Chris Jordan Data Management and Collections Group Texas Advanced Computing Center

Intro to Data Management. Chris Jordan Data Management and Collections Group Texas Advanced Computing Center Intro to Data Management Chris Jordan Data Management and Collections Group Texas Advanced Computing Center Why Data Management? Digital research, above all, creates files Lots of files Without a plan,

More information

The Preservation and Sustainability of Research Data

The Preservation and Sustainability of Research Data The Preservation and Sustainability of Research Data Dr Markus Buchhorn, Director, ICT Environments Australian National University; Formerly: Head, ANU Internet Futures Grid Services Architect, APAC Grid

More information

Scientific Computing Meets Big Data Technology: An Astronomy Use Case

Scientific Computing Meets Big Data Technology: An Astronomy Use Case Scientific Computing Meets Big Data Technology: An Astronomy Use Case Zhao Zhang AMPLab and BIDS UC Berkeley zhaozhang@cs.berkeley.edu In collaboration with Kyle Barbary, Frank Nothaft, Evan Sparks, Oliver

More information

Research Computing Building Blocks INFRASTRUCTURE FOR DATA AT PURDUE PRESTON SMITH, DIRECTOR OF RESEARCH SERVICES PSMITH@PURDUE.

Research Computing Building Blocks INFRASTRUCTURE FOR DATA AT PURDUE PRESTON SMITH, DIRECTOR OF RESEARCH SERVICES PSMITH@PURDUE. Research Computing Building Blocks INFRASTRUCTURE FOR DATA AT PURDUE PRESTON SMITH, DIRECTOR OF RESEARCH SERVICES PSMITH@PURDUE.EDU Discussion http://www.geartechnology.com/blog/wp- content/uploads/2015/11/opportunity-

More information

Utilizing the SDSC Cloud Storage Service

Utilizing the SDSC Cloud Storage Service Utilizing the SDSC Cloud Storage Service PASIG Conference January 13, 2012 Richard L. Moore rlm@sdsc.edu San Diego Supercomputer Center University of California San Diego Traditional supercomputer center

More information

EOFS Workshop Paris Sept, 2011. Lustre at exascale. Eric Barton. CTO Whamcloud, Inc. eeb@whamcloud.com. 2011 Whamcloud, Inc.

EOFS Workshop Paris Sept, 2011. Lustre at exascale. Eric Barton. CTO Whamcloud, Inc. eeb@whamcloud.com. 2011 Whamcloud, Inc. EOFS Workshop Paris Sept, 2011 Lustre at exascale Eric Barton CTO Whamcloud, Inc. eeb@whamcloud.com Agenda Forces at work in exascale I/O Technology drivers I/O requirements Software engineering issues

More information

INTEROPERABLE IMAGE DATA ACCESS THROUGH ARCGIS SERVER

INTEROPERABLE IMAGE DATA ACCESS THROUGH ARCGIS SERVER INTEROPERABLE IMAGE DATA ACCESS THROUGH ARCGIS SERVER Qian Liu Environmental Systems Research Institute 380 New York Street Redlands, CA92373, U.S.A - qliu@esri.com KEY WORDS: OGC, Standard, Interoperability,

More information

National Exposure Information System (Nexis) For Australia: Risk Assessment Opportunities

National Exposure Information System (Nexis) For Australia: Risk Assessment Opportunities National Exposure Information System (Nexis) For Australia: Risk Assessment Opportunities Nadimpalli, K., M. Edwards and D. Mullaly Risk Research Group, Geoscience Australia, GPO Box -378, Canberra, ACT

More information

Copyright 2012 California Institute of Technology. Government sponsorship acknowledged. ( )

Copyright 2012 California Institute of Technology. Government sponsorship acknowledged. ( ) Design and Implementation of a in the Cloud George Chang NASA/Jet Propulsion Laboratory 1 Use Cases Hybrid Cloud Design for a Portal Hadoop and Cloud Computing for Image Tiling 2 LMMP Lunar Mapping and

More information