Data-centric Renovation of Scientific Workflow in the Age of Big Data

Size: px
Start display at page:

Download "Data-centric Renovation of Scientific Workflow in the Age of Big Data"

Transcription

1 Data-centric Renovation of Scientific Workflow in the Age of Big Data Ryong Lee, Ph. D. Dept. of Scientific Big Data Research Korea Institute of Science and Technology Information Korea

2 Outline Overview Scientific Data Projects in KISTI On-going Work: Data-centric Renovation of Scientific Workflow for Oceanographers Towards Further Scientific Data Customization Services Conclusions and Future Work

3 Scientific Data Projects in KISTI Towards an Advanced Foundation for Data-Intensive Scientific Research National Scientific Data Governance Scientific Data Mgt. and Sharing Scientific Data Analytic Platforms Design and Development of Scientific Data Governance Systems Establishment of International Co-operative Network Development and Distribution of Scientific Data Management and Sharing Platforms R&Ds of Advanced Technologies for Big Data based Scientific Work Supercomputer Storage HPC Network

4 A Novel Data Service specialized in Scientific Big Data Customization Motivated by Oceanographers exploration and analyses with high-resolution and long-term remote sensing data from Satellites Collaborating with KIOST(Korea Institute of Ocean Science & Technology) and KOPRI(Korea Polar Research Institute) in research projects on climate change and red-tide analysis/detection Climate Change Analysis Red-tide Analysis/Detection Supporting Data-intensive Scientific Analysis Tasks Remote Sensing Data taken by Satellites are overwhelming their processing capabilities Customizing the big data for analytics is an growing strong demand

5 (Korea Polar Research Institute) Case Study 1: Exploring the effects of climate changes Non-linear Relationship between Biology and Environmental Changes For better estimation, Multiple remote sensing data need to be compared Unprecedented scale of (Hundreds of TBs) Array-based remote sensing data (via satellites) should be handled efficiently

6 Case Study 2: Red-tide Analysis/Prediction (Korea Institute of Ocean Science & Technology 2002~2013 Red-Tide in South Coast of Korea: A digitized data from hand-made estimation Damage (KRW) Red-tides cause serious economic damages quite unexpectedly Long-term and high-res. remote sensing data should be examined intensively

7 Enhancing Remote Data Handling Capability High Res. & Global Range High Res. A Novel Challenge: Overcoming the practical limits for Handling Global & High-Res. Remote Sensing Big Data Untouchable Domain 250m Scientists Practice: Boundary of Computational Limit in most small-scale science Labs. 500m Low Res. 1000m Local Global 7

8 Realizing Oceanographers Dream to equip with better working env. Renovation of Remote Sensing Big-Data Customization Process As-Is: Require ments MODIS Data Transform (L1Aà..à L3 SMI) File-based Data Analysis (L3 Level) To-Be: Customized Data Transform As-Is: MODIS Data Transform (L1Aà..à L2 ) To-Be: Slow Array-based Data Management (Global Area, 1km) File-based Data Analysis (L2 Level) (Local Area, 1/0.5/0.25km) Customized Data Transform Array-based Data Analysis Slow Array-based Data Management Customized Data Transform Array-based Data Management SciDB based Data Mgt. Base Array Derived Arrays Platform Design (KISTI) Bottleneck A2 A1 L3SMI in SciDB MODIS L2 A3 UDF Extension f(a2, A3) MODIS L3 BIN MODIS L3 SMI Selective Loading Array Manipulation & Computation Complex Analysis in SciDB Goal: Global Climate Change Analysis Goal: Red-tide Analysis/Detection User Environment for Scientific Data Analytics UI for Customizing Data Transform Functions - Transform Customization - Monitoring of Transform - Loading into Array DBMS UI for Array-based A nalysis Functions - SciDB Viewer - Array Data Manipulation - Array Data Provenance UDF f(a1, A2) Parallel Com puting Quick and Smart!! Array-based Data Analysis MODIS L1A MODIS L1B Quick and Smart!! Data Visualization Functions - SciDB Array Visualization - Display of Satellite Image - Big Data Summarization

9 A-DISC (Array-based Data-Intensive Scientific Computing Platform) What We Pursue: To support a variety of Scientists' Discovery with Big Data in their friendly manner What We Enable Array-based Data-Intensive Scientific Computing Platform (A-DISC) Scalable Versatile Scientists-friendly How We Approach Big Data Preprocessing on Parallelism Scientific Data Customization Interactive Scientific Data Workspace What We Conduct HPC-based MODIS Satellite Remote Sensing Data Translation Array DBMS based Scientific Data Analysis Support Scientific Understanding Global Ecology Analysis BigSat-Converter Scientific Data Workspace Red-tide Analysis and Detection

10 Renovate the Workflow: HPC* and Array-based Data Processing User User User Customization Transforming on Parallelism Loading into Array DBMS Customizing Options Big Data Store (Transformed Data Mgt.) Array-based Data Analytics Web-based Interactive Big Data Analytics Customized Data Transform Parallel and Distributed Computing (SGE) Big Data Store User-customized Data, ready for Analysis HDF-to-Array Transform & Load (L1A, L1B, L2, L3) s n a r T Fast form NASA MODIS Aqua * HPC (High-Performance Computing) Array Data Manipulation & Analysis Scientific Data on Array DBMS

11 BigSat-Converter A Server-Client System for Customized Transform of Remote Sensing Big Data Currently, MODIS Aqua/Terrain L1A, L2, RGB, L3 converting on a HPC cluster Planning to support various remote sensing data customizations

12 Enhancing the Power of Data Transformation with HPC User Administrator Customization: Specification of Region/Period/Product/etc. User Customization Administration: Managing the utilization of Computing and Storage Resources Transforming on Parallelism Customizing Options Master Parallel Computing (Sun Grid Engine) 10G Infiniband Big Data Store (Transformed Data Mgt.) Workers HDF-formatted Data Big Data Store (L1A, L1B, L2, L3) NAS Storage 15 TB Format NTFS CPU User-customized Data, ready for Analysis Master & 9 Worker Nodes Memory Storage 2260 MHz, 8 Core 18 GB 250GB 12

13 Significant Benefits of HPC in Remote Sensing Data Transform L1B L2 Total Single 60m 15s 74m 12s 134m 27s BigSat-Con verter 7m 18s 10m 25s 17m 43s Speed-Up 8.25X 7.12X 7.59X Satellite images of Korean peninsula Improvement of Process Improvement of Data model Legacy Code Re-designed Code Performance Improvement: 7.59 times faster 1 month 4 days

14 Array-based Remote Sensing Data Management and Manipulation L1B RGB Array - Dimension: longitude x latitude x time ( x x 3 days ) - Attributes: <Red, Green, Blue> Swath - 1 swath (in RGB), every 5 mins - Image size : 136 x day: 288 swaths L3 SMI in SciDB Jan. 1-31st, 2014 Dimension: (lon. x lat x time) 4320 x 2160 x 31 (days) Attribute: <chlorophyll>

15 Scientific Data Workspace An integrated workspace for scientific data manipulations and analyses Array data on clusters are accessible easily on a graphic user interface R-based analytical programming is supported on a HPC cluster

16 Applied to the field for Red-Tide Analysis A domestic news article regarding to our achievement with KIOST on the red-tide analysis issue A screenshot of our system application

17 Summary: How we enhanced Scientists Capabilities to Big Data Customization of Remote Sensing Big Data Increasing public access to scientific data of huge volume Scientists data requests are various, not being satisfied by the forms given from data publishers Customization service will become important more and more Array-based Data Manipulation and Analysis Scientific data are often stored in a formatted file (HDF, etc.) Adopting Array DBMS Technology can boost the processing Data loading into Array DBMS (for R based analysis) Finally, letting scientists work much easily with Array DBMS

18 Towards Further Scientific Data Customization Services Customization Platforms for Utilizing Scientific Big Data are growing demands from Public Requests for resolving many Natural and Social Problems Enhancing and Boosting Scientific Big Data Processes are an interdisciplinary work to understand domain knowledge for developing unprecedented technology and systems It s not a simple combination of various IT technologies, but rather a well-crafted artwork which should work as part of our society, constructing an elementary foundation for the whole organic integration of various systems

19 Conclusions and Future Work At the moment, we are conducting R&Ds of Scientific Data Customization Service focusing on remote sensing data transform for practical requests We will continue to enhance the developing system for various scientific data as well as remote sensing data towards generic scientific customization services We are always soliciting international cooperation for sharing further R&D issues and valuable experience on common interests Contact Me: Ryong Lee, Ph.D. ( ryonglee@kisti.re.kr )

20 Thank you very much for your kind attention!!

Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research

Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research Trends: Data on an Exponential Scale Scientific data doubles every year Combination of inexpensive sensors + exponentially

More information

In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps Yu Su, Yi Wang, Gagan Agrawal The Ohio State University Motivation HPC Trends Huge performance gap CPU: extremely fast for generating

More information

ICSTI 2014 General Assembly October 18-19, 2014

ICSTI 2014 General Assembly October 18-19, 2014 ICSTI 2014 General Assembly October 18-19, 2014 TACC Workshop Sunday, October 19 th, 2014 Enhancing Discoverability and Accessibility of Scientific and Technical Research Information and Data The TACC

More information

Data-Intensive Science and Scientific Data Infrastructure

Data-Intensive Science and Scientific Data Infrastructure Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific

More information

Quick Reference Selling Guide for Intel Lustre Solutions Overview

Quick Reference Selling Guide for Intel Lustre Solutions Overview Overview The 30 Second Pitch Intel Solutions for Lustre* solutions Deliver sustained storage performance needed that accelerate breakthrough innovations and deliver smarter, data-driven decisions for enterprise

More information

With DDN Big Data Storage

With DDN Big Data Storage DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big

More information

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Big Data Mining Services and Knowledge Discovery Applications on Clouds Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades

More information

GEOGRAPHIC INFORMATION SYSTEMS

GEOGRAPHIC INFORMATION SYSTEMS GEOGRAPHIC INFORMATION SYSTEMS WHAT IS A GEOGRAPHIC INFORMATION SYSTEM? A geographic information system (GIS) is a computer-based tool for mapping and analyzing spatial data. GIS technology integrates

More information

IS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration

IS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration IS-ENES/PrACE Meeting EC-EARTH 3 A High-resolution Configuration Motivation Generate a high-resolution configuration of EC-EARTH to Prepare studies of high-resolution ESM in climate mode Prove and improve

More information

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) Goal Develop and deploy comprehensive, integrated, sustainable, and secure cyberinfrastructure (CI) to accelerate research

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training

More information

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India 1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto

More information

On a Hadoop-based Analytics Service System

On a Hadoop-based Analytics Service System Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology

More information

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India 3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India Call for Papers Cloud computing has emerged as a de facto computing

More information

Parallel Large-Scale Visualization

Parallel Large-Scale Visualization Parallel Large-Scale Visualization Aaron Birkland Cornell Center for Advanced Computing Data Analysis on Ranger January 2012 Parallel Visualization Why? Performance Processing may be too slow on one CPU

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

Korea Industrial Supercomputing 2013. Oct. 23, 2013. Sang Min Lee, Ph.D. Hyungwook Park KISTI National Institute of Supercomputing and Networking

Korea Industrial Supercomputing 2013. Oct. 23, 2013. Sang Min Lee, Ph.D. Hyungwook Park KISTI National Institute of Supercomputing and Networking Korea Industrial Supercomputing 2013 Oct. 23, 2013 Sang Min Lee, Ph.D. Hyungwook Park KISTI National Institute of Supercomputing and Networking HPC Support for SME Blue ocean-type technology and product

More information

High-Performance Visualization of Geographic Data

High-Performance Visualization of Geographic Data High-Performance Visualization of Geographic Data Presented by Budhendra Bhaduri Alexandre Sorokine Geographic Information Science and Technology Computational Sciences and Engineering Managed by UT-Battelle

More information

Data Centric Interactive Visualization of Very Large Data

Data Centric Interactive Visualization of Very Large Data Data Centric Interactive Visualization of Very Large Data Bruce D Amora, Senior Technical Staff Gordon Fossum, Advisory Engineer IBM T.J. Watson Research/Data Centric Systems #OpenPOWERSummit Data Centric

More information

NASA s Big Data Challenges in Climate Science

NASA s Big Data Challenges in Climate Science NASA s Big Data Challenges in Climate Science Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at IEEE Big Data 2014 Workshop October 29, 2014 1 2 7-km GEOS-5 Nature Run

More information

Make the Most of Big Data to Drive Innovation Through Reseach

Make the Most of Big Data to Drive Innovation Through Reseach White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability

More information

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory June 2010 Highlights First Petaflop Supercomputer

More information

HPC technology and future architecture

HPC technology and future architecture HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr

More information

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which

More information

Optimized Hadoop for Enterprise

Optimized Hadoop for Enterprise Optimized Hadoop for Enterprise Smart Big data Platform provides Reliability, Security, and Ease of Use + Big Data, Valuable Resource for Forecasting the Future of Businesses + Offers integrated and end-to-end

More information

How To Build A Cloud Computer

How To Build A Cloud Computer Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology

More information

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery Center for Information Services and High Performance Computing (ZIH) Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery Richard Grunzke*, Jens Krüger, Sandra Gesing, Sonja

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

Data Requirements from NERSC Requirements Reviews

Data Requirements from NERSC Requirements Reviews Data Requirements from NERSC Requirements Reviews Richard Gerber and Katherine Yelick Lawrence Berkeley National Laboratory Summary Department of Energy Scientists represented by the NERSC user community

More information

SGI HPC Systems Help Fuel Manufacturing Rebirth

SGI HPC Systems Help Fuel Manufacturing Rebirth SGI HPC Systems Help Fuel Manufacturing Rebirth Created by T A B L E O F C O N T E N T S 1.0 Introduction 1 2.0 Ongoing Challenges 1 3.0 Meeting the Challenge 2 4.0 SGI Solution Environment and CAE Applications

More information

Clusters: Mainstream Technology for CAE

Clusters: Mainstream Technology for CAE Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research

Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research Dennis Gannon Cloud Computing Futures extreme Computing Group Microsoft Research 2 Cloud Concepts Data Center Architecture The cloud flavors: IaaS, PaaS, SaaS Our world of client devices plus the cloud

More information

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Outline Introduction EO challenges; EO and classical/cloud computing; EO Services The computing platform Cluster -> Grid -> Cloud

More information

Remote Graphical Visualization of Large Interactive Spatial Data

Remote Graphical Visualization of Large Interactive Spatial Data Remote Graphical Visualization of Large Interactive Spatial Data ComplexHPC Spring School 2011 International ComplexHPC Challenge Cristinel Mihai Mocan Computer Science Department Technical University

More information

LSKA 2010 Survey Report Job Scheduler

LSKA 2010 Survey Report Job Scheduler LSKA 2010 Survey Report Job Scheduler Graduate Institute of Communication Engineering {r98942067, r98942112}@ntu.edu.tw March 31, 2010 1. Motivation Recently, the computing becomes much more complex. However,

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Big data analy+cs for global change monitoring and research in forestry and agriculture. Lubia Vinhas

Big data analy+cs for global change monitoring and research in forestry and agriculture. Lubia Vinhas Big data analy+cs for global change monitoring and research in forestry and agriculture Lubia Vinhas Earth observa+on satellites and geosensor webs provide key informa+on about global change but that informa+on

More information

Introduction of KISTI and NISN Resource and Services Bioinformatics applications Conclusion

Introduction of KISTI and NISN Resource and Services Bioinformatics applications Conclusion Introduction of KISTI and NISN Resource and Services Bioinformatics applications Conclusion President National Nano-Technology Policy Center National Institute of Supercomputing and Networking Div. of

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

TUT NoSQL Seminar (Oracle) Big Data

TUT NoSQL Seminar (Oracle) Big Data Timo Raitalaakso +358 40 848 0148 rafu@solita.fi TUT NoSQL Seminar (Oracle) Big Data 11.12.2012 Timo Raitalaakso MSc 2000 Work: Solita since 2001 Senior Database Specialist Oracle ACE 2012 Blog: http://rafudb.blogspot.com

More information

Towards Analytical Data Management for Numerical Simulations

Towards Analytical Data Management for Numerical Simulations Towards Analytical Data Management for Numerical Simulations Ramon G. Costa, Fábio Porto, Bruno Schulze {ramongc, fporto, schulze}@lncc.br National Laboratory for Scientific Computing - RJ, Brazil Abstract.

More information

A SciDB-based Framework for Efficient Satellite Data Storage and Query based on Dynamic Atmospheric Event Trajectory

A SciDB-based Framework for Efficient Satellite Data Storage and Query based on Dynamic Atmospheric Event Trajectory A SciDB-based Framework for Efficient Satellite Data Storage and Query based on Dynamic Atmospheric Event Trajectory ABSTRACT Luboš Krčál Nanyang Technological University, Singapore Czech Technical University

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File

More information

NASA's Strategy and Activities in Server Side Analytics

NASA's Strategy and Activities in Server Side Analytics NASA's Strategy and Activities in Server Side Analytics Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at the ESGF/UVCDAT Conference Lawrence Livermore National Laboratory

More information

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014 HPC Cluster Decisions and ANSYS Configuration Best Practices Diana Collier Lead Systems Support Specialist Houston UGM May 2014 1 Agenda Introduction Lead Systems Support Specialist Cluster Decisions Job

More information

Dutch HPC Cloud: flexible HPC for high productivity in science & business

Dutch HPC Cloud: flexible HPC for high productivity in science & business Dutch HPC Cloud: flexible HPC for high productivity in science & business Dr. Axel Berg SARA national HPC & e-science Support Center, Amsterdam, NL April 17, 2012 4 th PRACE Executive Industrial Seminar,

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.

How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc. How to Ingest Data into Google BigQuery using Talend for Big Data A Technical Solution Paper from Saama Technologies, Inc. July 30, 2013 Table of Contents Intended Audience What you will Learn Background

More information

On-Demand Supercomputing Multiplies the Possibilities

On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Image courtesy of Wolfram Research, Inc. On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server

More information

HPC & Visualization. Visualization and High-Performance Computing

HPC & Visualization. Visualization and High-Performance Computing HPC & Visualization Visualization and High-Performance Computing Visualization is a critical step in gaining in-depth insight into research problems, empowering understanding that is not possible with

More information

High Performance Computing

High Performance Computing High Parallel Computing Hybrid Program Coding Heterogeneous Program Coding Heterogeneous Parallel Coding Hybrid Parallel Coding High Performance Computing Highly Proficient Coding Highly Parallelized Code

More information

Hue Streams. Seismic Compression Technology. Years of my life were wasted waiting for data loading and copying

Hue Streams. Seismic Compression Technology. Years of my life were wasted waiting for data loading and copying Hue Streams Seismic Compression Technology Hue Streams real-time seismic compression results in a massive reduction in storage utilization and significant time savings for all seismic-consuming workflows.

More information

Data Semantics Aware Cloud for High Performance Analytics

Data Semantics Aware Cloud for High Performance Analytics Data Semantics Aware Cloud for High Performance Analytics Microsoft Future Cloud Workshop 2011 June 2nd 2011, Prof. Jun Wang, Computer Architecture and Storage System Laboratory (CASS) Acknowledgement

More information

Outline. What is Big data and where they come from? How we deal with Big data?

Outline. What is Big data and where they come from? How we deal with Big data? What is Big Data Outline What is Big data and where they come from? How we deal with Big data? Big Data Everywhere! As a human, we generate a lot of data during our everyday activity. When you buy something,

More information

Visualization @ SUN. Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems

Visualization @ SUN. Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems Visualization @ SUN Shared Visualization 1.1 Software Scalable Visualization 1.1 Solutions Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems The Data Tsunami Visualization is

More information

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business

More information

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS data analysis José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS Cluster definition: A computer cluster is a group of linked computers, working

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

Adobe Insight, powered by Omniture

Adobe Insight, powered by Omniture Adobe Insight, powered by Omniture Accelerating government intelligence to the speed of thought 1 Challenges that analysts face 2 Analysis tools and functionality 3 Adobe Insight 4 Summary Never before

More information

High Performance Computing Cloud Offerings from IBM Technical Computing IBM Redbooks Solution Guide

High Performance Computing Cloud Offerings from IBM Technical Computing IBM Redbooks Solution Guide High Performance Computing Cloud Offerings from IBM Technical Computing IBM Redbooks Solution Guide The extraordinary demands that engineering, scientific, and research organizations place upon big data

More information

Statistical Analysis and Visualization for Cyber Security

Statistical Analysis and Visualization for Cyber Security Statistical Analysis and Visualization for Cyber Security Joanne Wendelberger, Scott Vander Wiel Statistical Sciences Group, CCS-6 Los Alamos National Laboratory Quality and Productivity Research Conference

More information

In this issue of CG&A, researchers share their

In this issue of CG&A, researchers share their Editor: Theresa-Marie Rhyne The Top 10 Challenges in Extreme-Scale Visual Analytics Pak Chung Wong Pacific Northwest National Laboratory Han-Wei Shen Ohio State University Christopher R. Johnson University

More information

General Parallel File System (GPFS) Native RAID For 100,000-Disk Petascale Systems

General Parallel File System (GPFS) Native RAID For 100,000-Disk Petascale Systems General Parallel File System (GPFS) Native RAID For 100,000-Disk Petascale Systems Veera Deenadhayalan IBM Almaden Research Center 2011 IBM Corporation Hard Disk Rates Are Lagging There have been recent

More information

Jeff Wolf Deputy Director HPC Innovation Center

Jeff Wolf Deputy Director HPC Innovation Center Public Presentation for Blue Gene Consortium Nov. 19, 2013 www.hpcinnovationcenter.com Jeff Wolf Deputy Director HPC Innovation Center This work was performed under the auspices of the U.S. Department

More information

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park Big Data: Using ArcGIS with Apache Hadoop Erik Hoel and Mike Park Outline Overview of Hadoop Adding GIS capabilities to Hadoop Integrating Hadoop with ArcGIS Apache Hadoop What is Hadoop? Hadoop is a scalable

More information

Big Data Means at Least Three Different Things. Michael Stonebraker

Big Data Means at Least Three Different Things. Michael Stonebraker Big Data Means at Least Three Different Things. Michael Stonebraker The Meaning of Big Data - 3 V s Big Volume With simple (SQL) analytics With complex (non-sql) analytics Big Velocity Drink from a fire

More information

Windows Server 2012 授 權 說 明

Windows Server 2012 授 權 說 明 Windows Server 2012 授 權 說 明 PROCESSOR + CAL HA 功 能 相 同 的 記 憶 體 及 處 理 器 容 量 虛 擬 化 Windows Server 2008 R2 Datacenter Price: NTD173,720 (2 CPU) Packaging All features Unlimited virtual instances Per processor

More information

Bringing the Cloud Underground: Lessons for Bringing the Next IT Revolution to Geoscience

Bringing the Cloud Underground: Lessons for Bringing the Next IT Revolution to Geoscience Bringing the Cloud Underground: Lessons for Bringing the Next IT Revolution to Geoscience Grant Sanden and Yannai Segal Enersoft Inc. Summary This article describes the emerging technology of cloud computing

More information

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q

More information

Supercomputing on Windows. Microsoft (Thailand) Limited

Supercomputing on Windows. Microsoft (Thailand) Limited Supercomputing on Windows Microsoft (Thailand) Limited W hat D efines S upercom puting A lso called High Performance Computing (HPC) Technical Computing Cutting edge problems in science, engineering and

More information

Early Cloud Experiences with the Kepler Scientific Workflow System

Early Cloud Experiences with the Kepler Scientific Workflow System Available online at www.sciencedirect.com Procedia Computer Science 9 (2012 ) 1630 1634 International Conference on Computational Science, ICCS 2012 Early Cloud Experiences with the Kepler Scientific Workflow

More information

A Service for Data-Intensive Computations on Virtual Clusters

A Service for Data-Intensive Computations on Virtual Clusters A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent

More information

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel Big Data and Analytics: A Conceptual Overview Mike Park Erik Hoel In this technical workshop This presentation is for anyone that uses ArcGIS and is interested in analyzing large amounts of data We will

More information

Simple Introduction to Clusters

Simple Introduction to Clusters Simple Introduction to Clusters Cluster Concepts Cluster is a widely used term meaning independent computers combined into a unified system through software and networking. At the most fundamental level,

More information

EMBL Identity & Access Management

EMBL Identity & Access Management EMBL Identity & Access Management Rupert Lück EMBL Heidelberg e IRG Workshop Zürich Apr 24th 2008 Outline EMBL Overview Identity & Access Management for EMBL IT Requirements & Strategy Project Goal and

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Data Movement and Storage. Drew Dolgert and previous contributors

Data Movement and Storage. Drew Dolgert and previous contributors Data Movement and Storage Drew Dolgert and previous contributors Data Intensive Computing Location Viewing Manipulation Storage Movement Sharing Interpretation $HOME $WORK $SCRATCH 72 is a Lot, Right?

More information

Automating Big Data Benchmarking for Different Architectures with ALOJA

Automating Big Data Benchmarking for Different Architectures with ALOJA www.bsc.es Jan 2016 Automating Big Data Benchmarking for Different Architectures with ALOJA Nicolas Poggi, Postdoc Researcher Agenda 1. Intro on Hadoop performance 1. Current scenario and problematic 2.

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

More information

Hadoop & SAS Data Loader for Hadoop

Hadoop & SAS Data Loader for Hadoop Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle

More information

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 24.

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 24. bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 24. November 2010 Richling/Kredel (URZ/RUM) bwgrid Treff WS 2010/2011 1 / 17 Course

More information

XSEDE Data Analytics Use Cases

XSEDE Data Analytics Use Cases XSEDE Data Analytics Use Cases 14th Jun 2013 Version 0.3 XSEDE Data Analytics Use Cases Page 1 Table of Contents A. Document History B. Document Scope C. Data Analytics Use Cases XSEDE Data Analytics Use

More information

Data Management/Visualization on the Grid at PPPL. Scott A. Klasky Stephane Ethier Ravi Samtaney

Data Management/Visualization on the Grid at PPPL. Scott A. Klasky Stephane Ethier Ravi Samtaney Data Management/Visualization on the Grid at PPPL Scott A. Klasky Stephane Ethier Ravi Samtaney The Problem Simulations at NERSC generate GB s TB s of data. The transfer time for practical visualization

More information

A Design of Resource Fault Handling Mechanism using Dynamic Resource Reallocation for the Resource and Job Management System

A Design of Resource Fault Handling Mechanism using Dynamic Resource Reallocation for the Resource and Job Management System A Design of Resource Fault Handling Mechanism using Dynamic Resource Reallocation for the Resource and Job Management System Young-Ho Kim, Eun-Ji Lim, Gyu-Il Cha, Seung-Jo Bae Electronics and Telecommunications

More information

Data Analytics at NERSC. Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services

Data Analytics at NERSC. Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services Data Analytics at NERSC Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services NERSC User Meeting August, 2015 Data analytics at NERSC Science Applications Climate, Cosmology, Kbase, Materials,

More information

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop SURFsara HPC Cloud Workshop doc.hpccloud.surfsara.nl UvA workshop 2016-01-25 UvA HPC Course Jan 2016 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current

More information

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC Mississippi State University High Performance Computing Collaboratory Brief Overview Trey Breckenridge Director, HPC Mississippi State University Public university (Land Grant) founded in 1878 Traditional

More information

How To Use Hadoop For Gis

How To Use Hadoop For Gis 2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Big Data: Using ArcGIS with Apache Hadoop David Kaiser Erik Hoel Offering 1330 Esri UC2013. Technical Workshop.

More information

Temporal variation in snow cover over sea ice in Antarctica using AMSR-E data product

Temporal variation in snow cover over sea ice in Antarctica using AMSR-E data product Temporal variation in snow cover over sea ice in Antarctica using AMSR-E data product Michael J. Lewis Ph.D. Student, Department of Earth and Environmental Science University of Texas at San Antonio ABSTRACT

More information

NITRD and Big Data. George O. Strawn NITRD

NITRD and Big Data. George O. Strawn NITRD NITRD and Big Data George O. Strawn NITRD Caveat auditor The opinions expressed in this talk are those of the speaker, not the U.S. government Outline What is Big Data? Who is NITRD? NITRD's Big Data Research

More information

MapReduce and Hadoop Distributed File System

MapReduce and Hadoop Distributed File System MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) bina@buffalo.edu http://www.cse.buffalo.edu/faculty/bina Partially

More information