The Ophidia framework: toward cloud- based big data analy;cs for escience Sandro Fiore, Giovanni Aloisio, Ian Foster, Dean Williams

Size: px
Start display at page:

Download "The Ophidia framework: toward cloud- based big data analy;cs for escience Sandro Fiore, Giovanni Aloisio, Ian Foster, Dean Williams"

Transcription

1 The Ophidia framework: toward cloud- based big data analy;cs for escience Sandro Fiore, Giovanni Aloisio, Ian Foster, Dean Williams Sandro Fiore, Ph.D. CMCC Scientific Computing and Operations Division

2 The big challenge is to model this complex system! Several complex processes to be simulated Several interacting processes Great range of time scales to be analyzed Great range of spatial scales to be considered Need interdisciplinar sciences (physics, chemistry, biology, geology, ) Inherently non-linear governing equations Need sophisticated numerics Need huge computational resources and large volumes of data can be produced Warren M. Washington NCAR Scien;fic Grand Challenges Workshop Series: Challenges in Climate Change Science and the Role of Compu;ng at the Extreme Scale DOE Workshop (ASCR- BER) November 6-7, 2008

3 ESGF & the CMIP5 data archive!

4 Climate change domain: the current scientific workflow and the ESGF use case! Workflow: search, locate, download, analyze, display results J. Chen, A. Choudhary, S. Feldman, B. Hendrickson, C.R. Johnson, R. Mount, V. Sarkar, V. White, D. Williams. Synergistic Challenges in Data- Intensive Science and Exascale Computing, DOE ASCAC Data Subcommittee Report, Department of Energy Office of Science, March, 2013.

5 The Ophidia Project Ophidia is a research effort carried out at the Euro Mediterranean Centre on Climate Change (CMCC) to address big data challenges, issues and requirements for climate change data analytics. S. Fiore, A. D Anca, C. Palazzo, I. Foster, D. N. Williams, G. Aloisio, Ophidia: toward bigdata analytics for escience, ICCS2013 Conference, Procedia Elsevier, Barcelona, June 5-7, 2013

6 Requirements and needs focus on: Time series analysis Data subsetting Model intercomparison Multimodel means Massive data reduction Data transformation (through array-based primitives) Param. Sweep experiments (same task applied on a set of data) Climate change signal Maps generation Ensemble analysis Data analytics worflow support But also Performance re-usability Data analytics requirements and use cases! extensibility

7 Ophidia Architecture Declarative language Front end Compute layer Standard interfaces Analytics Framework I/O layer Array-based primitives I/O server instance Storage layer System catalog New storage model Partitioning/hierarchical data mng

8 Array based primitives! Primitives provide array-based transformation A comprehensive set of primitives have been already implemented ( 100) By definition, a primitive is applied to a single fragment They come in the form of plugins (I/O server extensions) So far, Ophidia primitives perform data reduction, sub-setting, predicates evaluation, statistical analysis, compression, and so forth. Support is provided both for byte-oriented and bit-oriented arrays Plugins can be nested to get more complex functionalities Compression is provided as a primitive too Libraries like PetsC, GSL, C Math have been integrated

9 Array based primitives: OPH_BOXPLOT oph_boxplot(measure, "OPH_DOUBLE ) Single chunk or fragment (input) Single chunk or fragment (output)

10 Array based primitives: OPH_AGGREGATE oph_aggregate(measure,"oph_avg ) Single chunk or fragment (input) Ver%cal aggrega%on Single chunk or fragment (output)

11 Array based primitives: nesting feature oph_boxplot(oph_subarray(oph_uncompress(measure), 1,18), "OPH_DOUBLE ) Single chunk or fragment (input) Single chunk or fragment (output) subarray(measure, 1,18)

12 The analytics framework: datacube operators (about 50)! OPERATOR NAME OPERATOR DESCRIPTION Operators Data processing Domain-agnostic OPH_APPLY(datacube_in, datacube_out, array_based_primitive) OPH_DUPLICATE(datacube_ in, datacube_out) OPH_SUBSET(datacube_in, subset_string, datacube_out) OPH_MERGE(datacube_in, merge_param, datacube_out) OPH_SPLIT(datacube_in, split_param, datacube_out) OPH_INTERCOMPARISON (datacube_in1, datacube_in2, datacube_out) OPH_DELETE(datacube_in) Creates the datacube_out by applying the array-based primitive to the datacube_in Creates a copy of the datacube_in in the datacube_out Creates the datacube_out by doing a sub-setting of the datacube_in by applying the subset_string Creates the datacube_out by merging groups of merge_param fragments from datacube_in Creates the datacube_out by splitting into groups of split_param fragments each fragment of the datacube_in Creates the datacube_out which is the element-wise difference between datacube_in1 and datacube_in2 Removes the datacube_in Data Access (sequential and parallel operators) Metadata management (sequential and parallel operators) Data processing (parallel operators, MPI & OpenMP based) Import/Export (parallel operators) OPERATOR NAME OPERATOR DESCRIPTION Operators Data processing Domain-oriented OPH_EXPORT_NC Exports the datacube_in data into the (datacube_in, file_out) file_out NetCDF file. OPH_IMPORT_NC Imports the data stored into the file_in (file_in, datacube_out) NetCDF file into the new datacube_in datacube Operators Data access OPH_INSPECT_FRAG Inspects the data stored in the (datacube_in, fragment_in) fragment_in from the datacube_in OPH_PUBLISH(datacube_in) Publishes the datacube_in fragments into HTML pages Operators Metadata OPH_CUBE_ELEMENTS Provides the total number of the (datacube_in) elements in the datacube_in OPH_CUBE_SIZE Provides the disk space occupied by the (datacube_in) datacube_in OPH_LIST(void) Provides the list of available datacubes. OPH_CUBEIO(datacube_in) Provides the provenance information related to the datacube_in OPH_FIND(search_param) Provides the list of datacubes matching the search_param criteria

13 The analytics framework: datacube operators!

14 Metadata operator (provenance management)

15 EUBrazil Cloud Connect! The main objec%ve is the crea%on of a federated e- infrastructure for research using a user- centric approach. To achieve this, we need to pursue three objec%ves: Adapta;on of exis%ng applica%ons to tackle new scenarios emerging from coopera%on between Europe and Brazil relevant to both regions. Integra%on of frameworks and programming models for scien;fic gateways and complex workflows. Federa%on of resources, to build up a general- purpose infrastructure comprising exis;ng and heterogeneous resources Addi%onally, EUBrazilCC will: perform an ac%ve dissemina;on campaign, analyse innova;on, foster the involvement of Brazilian ins%tu%ons in cloud standards defini;on, and bring the EU Cloudscape series to broader interna%onal audience. EU Coordinator Ignacio Blanquer- Espert, Universitat Politècnica de València, Spain BR Coordinator Francisco Vilar Brasileiro, Universidade Federal de Campina Grande, Brazil!

16 Use Case on Biodiversity and Climate Change! Objec;ve: Understand the impact of climate change on terrestrial biodiversity through two workflows based on Earth observa%on and ground level data. Technical Challenge: Integrate parallel data analysis with other processing workflows in a geographically distributed environment. Interna;onal Added Value: Integra%on of biodiversity data and modelling with mul%spectral and remote sensing data for studying the cross- correla%on of biodiversity and climate change. Climate & Biodiversity Clearing- house Parallel Data Analysis Species- Link CMCC CIMP5 Imaging Data Federated Infrastructure & Pla\orm

17 Cloud-based deployment scenarios! Deployment A I/ O S C VM Instance Legend Applica%on Image Data Image Deployment B VM Instance S I/ O I/ O I/ O C VM Instance 1 VM Instance VM C Instance n C Data IO Component Compute Component Server Component Deployment C VM Instance 1 VM Instance 1 VM Instance S VM Instance VM Instance C n C C I/ O I/ O I/ O VM Instance VM C Instance m C

18 References! Papers [1] G. Aloisio, S. Fiore, I. Foster, D. N. Williams, Scientific big data analytics challenges at large scale, Big Data and Extreme-scale Computing (BDEC), April 30 to May 01, 2013, Charleston, USA (position paper). [2] S. Fiore, G. Aloisio, I. Foster, D. N. Williams, A software infrastructure for big data analytics, Big Data and Extreme-scale Computing (BDEC), February 26-28, 2014, Fukuoka, Japan (position paper). [3] S. Fiore, A. D'Anca, C. Palazzo, I. Foster, Dean N. Williams, Giovanni Aloisio, Ophidia: Toward Big Data Analytics for escience, ICCS 2013, June 5-7, 2013 Barcelona, Spain, Procedia Computer Science, Elsevier, pp [4] S. Fiore, C. Palazzo, A. D Anca, I. Foster, D. N. Williams, G. Aloisio, A big data analytics framework for scientific data management, Workshop on Big Data and Science: Infrastructure and Services, IEEE International Conference on BigData 2013, October 6-9, 2013, Santa Clara, USA, pp [5] S. Fiore, A. D'Anca, D. Elia, C. Palazzo, I. Foster, D. Williams, G. Aloisio, "Ophidia: A Full Software Stack for Scientific Data Analytics, proc. of the 2014 International Conference on High Performance Computing & Simulation (HPCS 2014), July 21 25, 2014, Bologna, Italy, pp , ISBN: For more info, please contact: - Dr. Sandro Fiore - Prof. Giovanni Aloisio

19 Questions?!

The Ophidia framework: toward big data analy7cs for climate change

The Ophidia framework: toward big data analy7cs for climate change The Ophidia framework: toward big data analy7cs for climate change Dr. Sandro Fiore Leader, Scientific data management research group Scientific Computing Division @ CMCC Prof. Giovanni Aloisio Director,

More information

Data analy(cs workflows for climate

Data analy(cs workflows for climate Data analy(cs workflows for climate Dr. Sandro Fiore Leader, Scientific data management research group Scientific Computing Division @ CMCC Prof. Giovanni Aloisio Director, Scientific Computing Division

More information

Major Goals. Ignacio Blanquer Universitat Politècnica de València European Coordinator. Wagner Meira Jr. Universidade Federal de Minas Gerais

Major Goals. Ignacio Blanquer Universitat Politècnica de València European Coordinator. Wagner Meira Jr. Universidade Federal de Minas Gerais Major Goals Ignacio Blanquer Universitat Politècnica de València European Coordinator Directorate General for Communica1ons Networks, Content and Technology So=ware and Services, Cloud Presented by Dorgival

More information

A big data analytics framework for scientific data management

A big data analytics framework for scientific data management 2013 IEEE International Conference on Big Data A big data analytics framework for scientific data management Sandro Fiore 1,2*, Cosimo Palazzo 1,2, Alessandro D Anca 1, Ian Foster 3, Dean N. Williams 4,

More information

The ORIENTGATE data platform

The ORIENTGATE data platform Seminar on Proposed and Revised set of indicators June 4-5, 2014 - Belgrade (Serbia) The ORIENTGATE data platform WP2, Action 2.4 Alessandra Nuzzo, Sandro Fiore, Giovanni Aloisio Scientific Computing and

More information

Future computing platforms for biodiversity science

Future computing platforms for biodiversity science www.bsc.es Future computing platforms for biodiversity science Daniele Lezzi Rome, 5 September 2013 Motivation Lack of service integration and interoperability of research e- Infrastructure e-irg 2013

More information

Big Data Research at DKRZ

Big Data Research at DKRZ Big Data Research at DKRZ Michael Lautenschlager and Colleagues from DKRZ and Scien:fic Compu:ng Research Group Symposium Big Data in Science Karlsruhe October 7th, 2014 Big Data in Climate Research Big

More information

IS-ENES WP3. D3.8 - Report on Training Sessions

IS-ENES WP3. D3.8 - Report on Training Sessions IS-ENES WP3 D3.8 - Report on Training Sessions Abstract: The deliverable D3.8 describes the organization and the outcomes of the tutorial meetings on the Grid Prototype designed within task4 of the NA2

More information

ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P.

ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P. ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P. Kushner, D. Waliser, S. Pascoe, A. Stephens, P. Kershaw,

More information

Efficient and Scalable Climate Metadata Management with the GRelC DAIS

Efficient and Scalable Climate Metadata Management with the GRelC DAIS Efficient and Scalable Climate Metadata Management with the GRelC DAIS G. Aloisio, S. Fiore CMCC Scientific Computing and Operations Division University of Salento, Lecce Context : countdown of the Intergovernmental

More information

CMIP6 Data Management at DKRZ

CMIP6 Data Management at DKRZ CMIP6 Data Management at DKRZ icas2015 Annecy, France on 13 17 September 2015 Michael Lautenschlager Deutsches Klimarechenzentrum (DKRZ) With contributions from ESGF Executive Committee and WGCM Infrastructure

More information

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013 Return on Experience on Cloud Compu2ng Issues a stairway to clouds Experts Workshop Agenda InGeoCloudS SoCware Stack InGeoCloudS Elas2city and Scalability Elas2c File Server Elas2c Database Server Elas2c

More information

The ORIENTGATE data platform

The ORIENTGATE data platform Research Papers Issue RP0195 December 2013 The ORIENTGATE data platform SCO Scientific Computing and Operations Division By Alessandra Nuzzo University of Salento and Scientific Computing and Operations

More information

NASA's Strategy and Activities in Server Side Analytics

NASA's Strategy and Activities in Server Side Analytics NASA's Strategy and Activities in Server Side Analytics Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at the ESGF/UVCDAT Conference Lawrence Livermore National Laboratory

More information

Data Management and Analysis in Support of DOE Climate Science

Data Management and Analysis in Support of DOE Climate Science Data Management and Analysis in Support of DOE Climate Science August 7 th, 2013 Dean Williams, Galen Shipman Presented to: Processing and Analysis of Very Large Data Sets Workshop The Climate Data Challenge

More information

Denis Caromel, CEO Ac.veEon. Orchestrate and Accelerate Applica.ons. Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst Capacity

Denis Caromel, CEO Ac.veEon. Orchestrate and Accelerate Applica.ons. Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst Capacity Cloud computing et Virtualisation : applications au domaine de la Finance Denis Caromel, CEO Ac.veEon Orchestrate and Accelerate Applica.ons Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst

More information

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management

More information

Towards a New Model for the Infrastructure Grid

Towards a New Model for the Infrastructure Grid INTERNATIONAL ADVANCED RESEARCH WORKSHOP ON HIGH PERFORMANCE COMPUTING AND GRIDS Cetraro (Italy), June 30 - July 4, 2008 Panel: From Grids to Cloud Services Towards a New Model for the Infrastructure Grid

More information

A Seman(c Approach to Cloud Infrastructure Management at Freudenberg IT

A Seman(c Approach to Cloud Infrastructure Management at Freudenberg IT Michael Schmidt A Seman(c Approach to Cloud Infrastructure Management at Freudenberg IT Timo Weber San Francisco, June 03, 2013 Freudenberg Group Facts & Figures Freudenberg IT a company of Freudenberg

More information

NASA s Big Data Challenges in Climate Science

NASA s Big Data Challenges in Climate Science NASA s Big Data Challenges in Climate Science Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at IEEE Big Data 2014 Workshop October 29, 2014 1 2 7-km GEOS-5 Nature Run

More information

Data Semantics Aware Cloud for High Performance Analytics

Data Semantics Aware Cloud for High Performance Analytics Data Semantics Aware Cloud for High Performance Analytics Microsoft Future Cloud Workshop 2011 June 2nd 2011, Prof. Jun Wang, Computer Architecture and Storage System Laboratory (CASS) Acknowledgement

More information

Microsoft Research Worldwide Presence

Microsoft Research Worldwide Presence Microsoft Research Worldwide Presence MSR India MSR New England Redmond Redmond, Washington Sept, 1991 San Francisco, California Jun, 1995 Cambridge, United Kingdom July, 1997 Beijing, China Nov, 1998

More information

Integration strategy

Integration strategy C3-INAD and ESGF: Integration strategy C3-INAD Middleware Team: Stephan Kindermann, Carsten Ehbrecht [DKRZ] Bernadette Fritzsch [AWI] Maik Jorra, Florian Schintke, Stefan Plantikov [ZUSE Institute] Markus

More information

Big Data and Clouds: Challenges and Opportuni5es

Big Data and Clouds: Challenges and Opportuni5es Big Data and Clouds: Challenges and Opportuni5es NIST January 15 2013 Geoffrey Fox gcf@indiana.edu h"p://www.infomall.org h"p://www.futuregrid.org School of Informa;cs and Compu;ng Digital Science Center

More information

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel

More information

CMIP5 Data Management CAS2K13

CMIP5 Data Management CAS2K13 CMIP5 Data Management CAS2K13 08. 12. September 2013, Annecy Michael Lautenschlager (DKRZ) With Contributions from ESGF CMIP5 Core Data Centres PCMDI, BADC and DKRZ Status DKRZ Data Archive HLRE-2 archive

More information

NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015

NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015 NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015-1 - A little bit about myself Computer Scien.st Brown, IIT Delhi Real- 3me Graphics, Virtual Reality, HCI Computa3onal

More information

From classical web mapping publica2on to INSPIRE service architecture in the Cloud InGeoCloudS BRGM, Pierre LAGARDE

From classical web mapping publica2on to INSPIRE service architecture in the Cloud InGeoCloudS BRGM, Pierre LAGARDE From classical web mapping publica2on to INSPIRE service architecture in the Cloud InGeoCloudS BRGM, Pierre LAGARDE The classical approach of the environmental data dissemina3on 2 The classical approach

More information

Federated Big Data for resource aggregation and load balancing with DIRAC

Federated Big Data for resource aggregation and load balancing with DIRAC Procedia Computer Science Volume 51, 2015, Pages 2769 2773 ICCS 2015 International Conference On Computational Science Federated Big Data for resource aggregation and load balancing with DIRAC Víctor Fernández

More information

Insights from 40 years of data services at NCAR s Research Data Archive. Douglas Schuster and Steven Worley

Insights from 40 years of data services at NCAR s Research Data Archive. Douglas Schuster and Steven Worley Insights from 40 years of data services at NCAR s Research Archive Douglas Schuster (schuster@ucar.edu) and Steven Worley (worley@ucar.edu) Highlights What is the? Evolution of Services User Registration

More information

Exploration of adaptive network transfer for 100 Gbps networks Climate100: Scaling the Earth System Grid to 100Gbps Network

Exploration of adaptive network transfer for 100 Gbps networks Climate100: Scaling the Earth System Grid to 100Gbps Network Exploration of adaptive network transfer for 100 Gbps networks Climate100: Scaling the Earth System Grid to 100Gbps Network February 1, 2012 Project period of April 1, 2011 through December 31, 2011 Principal

More information

High Performance Computing in Horizon 2020. February 26-28, 2014 Fukuoka Japan

High Performance Computing in Horizon 2020. February 26-28, 2014 Fukuoka Japan High Performance Computing in Horizon 2020 Big Data and Extreme Scale Computing Workshop 51214 February 26-28, 2014 Fukuoka Japan Excellence in Science DG CONNECT European Commission Jean-Yves Berthou

More information

Data-Intensive Science and Scientific Data Infrastructure

Data-Intensive Science and Scientific Data Infrastructure Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific

More information

PART 1. Representations of atmospheric phenomena

PART 1. Representations of atmospheric phenomena PART 1 Representations of atmospheric phenomena Atmospheric data meet all of the criteria for big data : they are large (high volume), generated or captured frequently (high velocity), and represent a

More information

NCDC Strategic Vision

NCDC Strategic Vision NOAA s National Climatic Data Center World s Largest Archive of Climate and Weather Data Presented to: Coastal Environmental Disasters Data Management Workshop September 16, 2014 Stephen Del Greco Deputy

More information

The data explosion is transforming science

The data explosion is transforming science Talk Outline The data tsunami and the 4 th paradigm of science The challenges for the long tail of science Where is the cloud being used now? The app marketplace SMEs Analytics as a service. What are the

More information

CLOUD BASED N-DIMENSIONAL WEATHER FORECAST VISUALIZATION TOOL WITH IMAGE ANALYSIS CAPABILITIES

CLOUD BASED N-DIMENSIONAL WEATHER FORECAST VISUALIZATION TOOL WITH IMAGE ANALYSIS CAPABILITIES CLOUD BASED N-DIMENSIONAL WEATHER FORECAST VISUALIZATION TOOL WITH IMAGE ANALYSIS CAPABILITIES M. Laka-Iñurrategi a, I. Alberdi a, K. Alonso b, M. Quartulli a a Vicomteh-IK4, Mikeletegi pasealekua 57,

More information

Percipient StorAGe for Exascale Data Centric Computing

Percipient StorAGe for Exascale Data Centric Computing Percipient StorAGe for Exascale Data Centric Computing per cip i ent (pr-sp-nt) adj. Having the power of perceiving, especially perceiving keenly and readily. n. One that perceives. Introducing: Seagate

More information

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Big Data Mining Services and Knowledge Discovery Applications on Clouds Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades

More information

GRNET Cloud Compu7ng Services An Overview

GRNET Cloud Compu7ng Services An Overview GRNET Cloud Compu7ng Services An Overview Panos Louridas louridas@grnet.gr GN3 Innova7on Workshop Copenhagen, October 10 11 2011 Greek Research and Technology Network 2 Outline u okeanos IaaS u pithos

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

The THREDDS Data Repository: for Long Term Data Storage and Access

The THREDDS Data Repository: for Long Term Data Storage and Access 8B.7 The THREDDS Data Repository: for Long Term Data Storage and Access Anne Wilson, Thomas Baltzer, John Caron Unidata Program Center, UCAR, Boulder, CO 1 INTRODUCTION In order to better manage ever increasing

More information

Next Generation Networking for Science Program Update. NGNS Program Managers Richard Carlson Thomas Ndousse ASCAC meeting 11/21/2014

Next Generation Networking for Science Program Update. NGNS Program Managers Richard Carlson Thomas Ndousse ASCAC meeting 11/21/2014 Next Generation Networking for Science Program Update NGNS Program Managers Richard Carlson Thomas Ndousse ASCAC meeting 11/21/2014 NGNS Program Mission The Goal of the program is to research, develop,

More information

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising Open Data Partners and AdReady April 2012 1 Executive Summary AdReady is working to develop and deploy sophisticated

More information

Webcasting vs. Web Conferencing. Webcasting vs. Web Conferencing

Webcasting vs. Web Conferencing. Webcasting vs. Web Conferencing Webcasting vs. Web Conferencing 0 Introduction Webcasting vs. Web Conferencing Aside from simple conference calling, most companies face a choice between Web conferencing and webcasting. These two technologies

More information

How To Build An Elastic Distributed System Over Big Data. http://esgf.org

How To Build An Elastic Distributed System Over Big Data. http://esgf.org How To Build An Elastic Distributed System Over Big Data Overview! The Earth System Grid Federation (ESGF.org) is indeed a spontaneous, unfunded, community-driven, open-source, collaborative effort to

More information

Introduc)on of Pla/orm ISF. Weina Ma Weina.Ma@uoit.ca

Introduc)on of Pla/orm ISF. Weina Ma Weina.Ma@uoit.ca Introduc)on of Pla/orm ISF Weina Ma Weina.Ma@uoit.ca Agenda Pla/orm ISF Product Overview Pla/orm ISF Concepts & Terminologies Self- Service Applica)on Management Applica)on Example Deployment Examples

More information

Research Data Alliance: Current Activities and Expected Impact. SGBD Workshop, May 2014 Herman Stehouwer

Research Data Alliance: Current Activities and Expected Impact. SGBD Workshop, May 2014 Herman Stehouwer Research Data Alliance: Current Activities and Expected Impact SGBD Workshop, May 2014 Herman Stehouwer The Vision 2 Researchers and innovators openly share data across technologies, disciplines, and countries

More information

MicroStrategy Course Catalog

MicroStrategy Course Catalog MicroStrategy Course Catalog 1 microstrategy.com/education 3 MicroStrategy course matrix 4 MicroStrategy 9 8 MicroStrategy 10 table of contents MicroStrategy course matrix MICROSTRATEGY 9 MICROSTRATEGY

More information

Big process for big data

Big process for big data Big process for big data Process automa9on for data- driven science Ian Foster Computa9on Ins9tute Argonne Na9onal Laboratory & The University of Chicago Talk at Astroinforma9cs 2012, Redmond, September

More information

CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk

CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk CEDA Storage Dr Matt Pritchard Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk How we store our data NAS Technology Backup JASMIN/CEMS CEDA Storage Data stored as files on disk. Data is migrated

More information

Selected Ac)vi)es Overseen by the WDAC

Selected Ac)vi)es Overseen by the WDAC Selected Ac)vi)es Overseen by the WDAC Some background obs4mips ana4mips CREATE- IP (in planning stage) Contribu)ons from: M. Bosilovich, O. Brown, V. Eyring, R. Ferraro., P. Gleckler, R. Joseph, J. PoQer,

More information

Cloud Computing for Fundamental Spatial Operations on Polygonal GIS Data

Cloud Computing for Fundamental Spatial Operations on Polygonal GIS Data 1 Cloud Computing for Fundamental Spatial Operations on Polygonal GIS Data Dinesh Agarwal, Satish Puri, Xi He, and Sushil K. Prasad 1 Department of Computer Science Georgia State University Atlanta - 30303,

More information

Cloud Computing @ JPL Science Data Systems

Cloud Computing @ JPL Science Data Systems Cloud Computing @ JPL Science Data Systems Emily Law, GSAW 2011 Outline Science Data Systems (SDS) Space & Earth SDSs SDS Common Architecture Components Key Components using Cloud Computing Use Case 1:

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

The Evolu*on of Service Management

The Evolu*on of Service Management The Evolu*on of Extending Disciplines Across the Enterprise Michael Jones Regional CTO - Architecture Michael.Jones@servicenow.com 2015 Now All Rights Reserved 1 How work gets done today! Emails Spreadsheets

More information

A Service for Data-Intensive Computations on Virtual Clusters

A Service for Data-Intensive Computations on Virtual Clusters A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent

More information

Network for Sustainable Ultrascale Computing (NESUS) www.nesus.eu

Network for Sustainable Ultrascale Computing (NESUS) www.nesus.eu Network for Sustainable Ultrascale Computing (NESUS) www.nesus.eu Objectives of the Action Aim of the Action: To coordinate European efforts for proposing realistic solutions addressing major challenges

More information

A General Approach to Real-time Workflow Monitoring Karan Vahi, Ewa Deelman, Gaurang Mehta, Fabio Silva

A General Approach to Real-time Workflow Monitoring Karan Vahi, Ewa Deelman, Gaurang Mehta, Fabio Silva A General Approach to Real-time Workflow Monitoring Karan Vahi, Ewa Deelman, Gaurang Mehta, Fabio Silva USC Information Sciences Institute Ian Harvey, Ian Taylor, Kieran Evans, Dave Rogers, Andrew Jones,

More information

HPC technology and future architecture

HPC technology and future architecture HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr

More information

Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner

Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner Research Group Scientific Computing Faculty of Computer Science University of Vienna AUSTRIA http://www.par.univie.ac.at

More information

The Study of a Hierarchical Hadoop Architecture in Multiple Data Centers Environment

The Study of a Hierarchical Hadoop Architecture in Multiple Data Centers Environment Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 2015, 9, 131-137 131 Open Access The Study of a Hierarchical Hadoop Architecture in Multiple Data Centers

More information

Interactive Data Visualization with Focus on Climate Research

Interactive Data Visualization with Focus on Climate Research Interactive Data Visualization with Focus on Climate Research Michael Böttinger German Climate Computing Center (DKRZ) 1 Agenda Visualization in HPC Environments Climate System, Climate Models and Climate

More information

Managing Cloud Service Provisioning and SLA Enforcement via Holistic Monitoring Techniques Vincent C. Emeakaroha

Managing Cloud Service Provisioning and SLA Enforcement via Holistic Monitoring Techniques Vincent C. Emeakaroha Managing Cloud Service Provisioning and SLA Enforcement via Holistic Monitoring Techniques Vincent C. Emeakaroha Matrikelnr: 0027525 vincent@infosys.tuwien.ac.at Supervisor: Univ.-Prof. Dr. Schahram Dustdar

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

NERSC REQUIREMENTS FOR ENABLING CS RESEARCH IN EXASCALE DATA ANALYTICS

NERSC REQUIREMENTS FOR ENABLING CS RESEARCH IN EXASCALE DATA ANALYTICS NERSC REQUIREMENTS FOR ENABLING CS RESEARCH IN EXASCALE DATA ANALYTICS Nagiza F. Samatova samatovan@ornl.gov Oak Ridge Na5onal Laboratory North Carolina State University January 5, 2011 ACKNOWLEDGEMENTS

More information

European Data Infrastructure - EUDAT Data Services & Tools

European Data Infrastructure - EUDAT Data Services & Tools European Data Infrastructure - EUDAT Data Services & Tools Dr. Ing. Morris Riedel Research Group Leader, Juelich Supercomputing Centre Adjunct Associated Professor, University of iceland BDEC2015, 2015-01-28

More information

Asynchronous Data Mining Tools at the GES-DISC

Asynchronous Data Mining Tools at the GES-DISC Asynchronous Data Mining Tools at the GES-DISC Long B. Pham, Stephen W. Berrick, Christopher S. Lynnes and Eunice K. Eng NASA Goddard Space Flight Center Distributed Active Archive Center Introduction

More information

OT- Med: Objec,va Terra - Mediterraneum. Joël Guiot

OT- Med: Objec,va Terra - Mediterraneum. Joël Guiot OT- Med: Objec,va Terra - Mediterraneum Joël Guiot Context The OT- Med Labex has been defined in this context q The Mediterranean Basin has been a key area of human- environment interac@ons for thousands

More information

A Technology for BigData Analysis Task Description using Domain-Specific Languages

A Technology for BigData Analysis Task Description using Domain-Specific Languages A Technology for Big Analysis Task Description using Domain-Specific Languages Sergey V. Kovalchuk 1, Artem V. Zakharchuk 1, Jiaqi Liao 1, Sergey V. Ivanov 1, Alexander V. Boukhanovsky 1,2 1 ITMO University,

More information

Big Data and Scientific Discovery

Big Data and Scientific Discovery Big Data and Scientific Discovery Bill Harrod Office of Science William.Harrod@science.doe.gov! February 26, 2014! Big Data and Scien*fic Discovery Next genera*on scien*fic breakthroughs require: Major

More information

Towards Analytical Data Management for Numerical Simulations

Towards Analytical Data Management for Numerical Simulations Towards Analytical Data Management for Numerical Simulations Ramon G. Costa, Fábio Porto, Bruno Schulze {ramongc, fporto, schulze}@lncc.br National Laboratory for Scientific Computing - RJ, Brazil Abstract.

More information

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success Developing an MDM Strategy Key Components for Success WHITE PAPER Table of Contents Introduction... 2 Process Considerations... 3 Architecture Considerations... 5 Conclusion... 9 About Knowledgent... 10

More information

Software Defined Networking for Extreme- Scale Science: Data, Compute, and Instrument Facilities

Software Defined Networking for Extreme- Scale Science: Data, Compute, and Instrument Facilities ASCR Intelligent Network Infrastructure Software Defined Networking for Extreme- Scale Science: Data, Compute, and Instrument Facilities Report of DOE ASCR Intelligent Network Infrastructure Workshop August

More information

Generalized Architecture for Dynamic Infrastructure Services

Generalized Architecture for Dynamic Infrastructure Services Generalized Architecture for Dynamic Infrastructure Services On behalf of GEYSERS Consor3um (presented by Pascale Vicat Blanc, GEYSERS Dissemina3on WPL) GLIF 2010, CERN, France 14 th october 2010 Grant

More information

High Performance Computing OpenStack Options. September 22, 2015

High Performance Computing OpenStack Options. September 22, 2015 High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,

More information

GeoKettle: A powerful open source spatial ETL tool

GeoKettle: A powerful open source spatial ETL tool GeoKettle: A powerful open source spatial ETL tool FOSS4G 2010 Dr. Thierry Badard, CTO Spatialytics inc. Quebec, Canada tbadard@spatialytics.com Barcelona, Spain Sept 9th, 2010 What is GeoKettle? It is

More information

UK Location Programme

UK Location Programme Location Information Interoperability Board Data Publisher How To Guide Understand the background to establishing an INSPIRE View Service using GeoServer DOCUMENT CONTROL Change Summary Version Date Author/Editor

More information

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket

More information

Lustre * Filesystem for Cloud and Hadoop *

Lustre * Filesystem for Cloud and Hadoop * OpenFabrics Software User Group Workshop Lustre * Filesystem for Cloud and Hadoop * Robert Read, Intel Lustre * for Cloud and Hadoop * Brief Lustre History and Overview Using Lustre with Hadoop Intel Cloud

More information

Ibis: Scaling Python Analy=cs on Hadoop and Impala

Ibis: Scaling Python Analy=cs on Hadoop and Impala Ibis: Scaling Python Analy=cs on Hadoop and Impala Wes McKinney, Budapest BI Forum 2015-10- 14 @wesmckinn 1 Me R&D at Cloudera Serial creator of structured data tools / user interfaces Mathema=cian MIT

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

Chao Chen 1 Michael Lang 2 Yong Chen 1. IEEE BigData, 2013. Department of Computer Science Texas Tech University

Chao Chen 1 Michael Lang 2 Yong Chen 1. IEEE BigData, 2013. Department of Computer Science Texas Tech University Chao Chen 1 Michael Lang 2 1 1 Data-Intensive Scalable Laboratory Department of Computer Science Texas Tech University 2 Los Alamos National Laboratory IEEE BigData, 2013 Outline 1 2 3 4 Outline 1 2 3

More information

Maximising the utility of OpeNDAP datasets through the NetCDF4 API

Maximising the utility of OpeNDAP datasets through the NetCDF4 API Maximising the utility of OpeNDAP datasets through the NetCDF4 API Stephen Pascoe (Stephen.Pascoe@stfc.ac.uk) Chris Mattmann (chris.a.mattmann@jpl.nasa.gov) Phil Kershaw (Philip.Kershaw@stfc.ac.uk) Ag

More information

The Construction of Seismic and Geological Studies' Cloud Platform Using Desktop Cloud Visualization Technology

The Construction of Seismic and Geological Studies' Cloud Platform Using Desktop Cloud Visualization Technology Send Orders for Reprints to reprints@benthamscience.ae 1582 The Open Cybernetics & Systemics Journal, 2015, 9, 1582-1586 Open Access The Construction of Seismic and Geological Studies' Cloud Platform Using

More information

Applying Apache Hadoop to NASA s Big Climate Data!

Applying Apache Hadoop to NASA s Big Climate Data! National Aeronautics and Space Administration Applying Apache Hadoop to NASA s Big Climate Data! Use Cases and Lessons Learned! Glenn Tamkin (NASA/CSC)!! Team: John Schnase (NASA/PI), Dan Duffy (NASA/CO),!

More information

Development of Earth Science Observational Data Infrastructure of Taiwan. Fang-Pang Lin National Center for High-Performance Computing, Taiwan

Development of Earth Science Observational Data Infrastructure of Taiwan. Fang-Pang Lin National Center for High-Performance Computing, Taiwan Development of Earth Science Observational Data Infrastructure of Taiwan Fang-Pang Lin National Center for High-Performance Computing, Taiwan GLIF 13, Singapore, 4 Oct 2013 The Path from Infrastructure

More information

Elastic Management of Cluster based Services in the Cloud

Elastic Management of Cluster based Services in the Cloud First Workshop on Automated Control for Datacenters and Clouds (ACDC09) June 19th, Barcelona, Spain Elastic Management of Cluster based Services in the Cloud Rafael Moreno Vozmediano, Ruben S. Montero,

More information

Next-Generation Networking for Science

Next-Generation Networking for Science Next-Generation Networking for Science ASCAC Presentation March 23, 2011 Program Managers Richard Carlson Thomas Ndousse Presentation

More information

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell R&D Manager, Scalable System So#ware Department Sandia National Laboratories is a multi-program laboratory managed and

More information

Optimizing Mass Storage Organization and Access for Multi-Dimensional Scientific Data

Optimizing Mass Storage Organization and Access for Multi-Dimensional Scientific Data Optimizing Mass Storage Organization and Access for Multi-Dimensional Scientific Data Robert Drach, Susan W. Hyer, Steven Louis, Gerald Potter, George Richmond Lawrence Livermore National Laboratory Livermore,

More information

Scalus A)ribute Workshop. Paris, April 14th 15th

Scalus A)ribute Workshop. Paris, April 14th 15th Scalus A)ribute Workshop Paris, April 14th 15th Content Mo=va=on, objec=ves, and constraints Scalus strategy Scenario and architectural views How the architecture works Mo=va=on for this MCITN Storage

More information

Big Data Services at DKRZ

Big Data Services at DKRZ Big Data Services at DKRZ Michael Lautenschlager and Colleagues from DKRZ and Scientific Computing Research Group MPI-M Seminar Hamburg, March 31st, 2015 Big Data in Climate Research Big data is an all-encompassing

More information

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM

More information

Scientific Computing's Productivity Gridlock and How Software Engineering Can Help

Scientific Computing's Productivity Gridlock and How Software Engineering Can Help Scientific Computing's Productivity Gridlock and How Software Engineering Can Help Stuart Faulk, Ph.D. Computer and Information Science University of Oregon Stuart Faulk CSESSP 2015 Outline Challenges

More information

JASMIN Cloud ESGF and UV- CDAT Conference 09-11 December 2014 STFC / Stephen Kill

JASMIN Cloud ESGF and UV- CDAT Conference 09-11 December 2014 STFC / Stephen Kill JASMIN Cloud ESGF and UV- CDAT Conference 09-11 December 2014 STFC / Stephen Kill Philip Kershaw (1, 2), Jonathan Churchill (5), Bryan Lawrence (1, 3, 4), Stephen Pascoe (1, 4) and MaE Pritchard (1) Centre

More information

Data Requirements from NERSC Requirements Reviews

Data Requirements from NERSC Requirements Reviews Data Requirements from NERSC Requirements Reviews Richard Gerber and Katherine Yelick Lawrence Berkeley National Laboratory Summary Department of Energy Scientists represented by the NERSC user community

More information

Globus Genomics Tutorial GlobusWorld 2014

Globus Genomics Tutorial GlobusWorld 2014 Globus Genomics Tutorial GlobusWorld 2014 Agenda Overview of Globus Genomics Example Collaborations Demonstration Globus Genomics interface Globus Online integration Scenario 1: Using Globus Genomics for

More information

Science Gateways What are they and why are they having such a tremendous impact on science? Nancy Wilkins- Diehr wilkinsn@sdsc.edu

Science Gateways What are they and why are they having such a tremendous impact on science? Nancy Wilkins- Diehr wilkinsn@sdsc.edu Science Gateways What are they and why are they having such a tremendous impact on science? Nancy Wilkins- Diehr wilkinsn@sdsc.edu What is a science gateway? science gateway /sī əәns gāt wā / n. 1. an

More information

OpenStack @ Cisco. Daneyon Hansen 3/28/2012. 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confiden+al 1

OpenStack @ Cisco. Daneyon Hansen 3/28/2012. 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confiden+al 1 OpenStack @ Cisco Daneyon Hansen 3/28/2012 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confiden+al 1 Customer value is moving up into sogware and web services Virtualiza+on and internet

More information