HPC technology and future architecture

Similar documents
Visual Analysis for Extremely Large Scale Scientific Computing

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Visual Analysis for Extremely Large- Scale Scientific Computing

A Brief Introduction to Apache Tez

Chapter 7. Using Hadoop Cluster and MapReduce

Bringing Big Data Modelling into the Hands of Domain Experts

SURFsara HPC Cloud Workshop

Big Data at Cloud Scale

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Big Data and Market Surveillance. April 28, 2014

Real Time Big Data Processing

Open source Google-style large scale data analysis with Hadoop

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Big Data and Cloud Computing for GHRSST

Data Centric Systems (DCS)

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Oracle Big Data SQL Technical Update

BIG DATA TRENDS AND TECHNOLOGIES

SURFsara HPC Cloud Workshop

Workshop on Hadoop with Big Data

Manifest for Big Data Pig, Hive & Jaql

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

On a Hadoop-based Analytics Service System

Data Refinery with Big Data Aspects

Energy efficiency in HPC :

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Putchong Uthayopas, Kasetsart University

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

BIG DATA What it is and how to use?

Application Development. A Paradigm Shift

Open source large scale distributed data management with Google s MapReduce and Bigtable

Advanced Big Data Analytics with R and Hadoop

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Cloud-based Infrastructures. Serving INSPIRE needs

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Manjrasoft Market Oriented Cloud Computing Platform

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

Enterprise HPC & Cloud Computing for Engineering Simulation. Barbara Hutchings Director, Strategic Partnerships ANSYS, Inc.

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

2015 The MathWorks, Inc. 1

Big Data on Google Cloud

Comprehensive Analytics on the Hortonworks Data Platform

Big Data - Infrastructure Considerations

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS

Cluster, Grid, Cloud Concepts

Hadoop & Spark Using Amazon EMR

Building your Big Data Architecture on Amazon Web Services

So What s the Big Deal?

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Explained. An introduction to Big Data Science.

Industry 4.0 and Big Data

Introducing EEMBC Cloud and Big Data Server Benchmarks

Cloud Computing. Alex Crawford Ben Johnstone

Modernizing Your Data Warehouse for Hadoop

Cloud JPL Science Data Systems

COMPUTER MEASUREMENT GROUP - India Hyderabad Chapter. Strategies to Optimize Cloud Costs By Cloud Performance Monitoring

Hadoop. Sunday, November 25, 12

Cloud Design and Implementation. Cheng Li MPI-SWS Nov 9 th, 2010

How To Handle Big Data With A Data Scientist

Big Data Use Case: Business Analytics

Concept and Project Objectives

Upcoming Announcements

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Manjrasoft Market Oriented Cloud Computing Platform

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

10- High Performance Compu5ng

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

Convergence of Big Data and Cloud

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel

Luncheon Webinar Series May 13, 2013

Hue Streams. Seismic Compression Technology. Years of my life were wasted waiting for data loading and copying

Hadoop and Map-Reduce. Swati Gore

Large-Scale Data Processing

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing

Big Data Analytics Nokia

Transcription:

HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr

Outline The VELaSSCo project General information Members of the consortium Motivations of the project Objectives of the project Target data Develop a Big Data platform The VELaSSCo architecture Big Data, what does it mean? Data of Big Data What are the challenges of Big Data Grid vs Cloud Big Data needs a distributed system Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-2

The VELaSSCo project General information VELaSSCO is a EC funded Project which deals with end-user visualization of huge simulation data (Big Data). 3 years project (2014 2016) By 2020, most crucial simulation results such as those from the aeronautic industry or automotive, will not be able to be stored in a single machine or server. How to store, access, simplify and manipulate billion of records to extract the relevant information? How to represent information in a feasible and flexible way? How to visualise and interactively inspect the huge quantity of information they produce taking into account end-user's needs? Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-3

The VELaSSCo project Members of the consortium Big Data Infrastructure Data Analytics Visualization Expertise End-users / Beneficiaries Big Data Issues HPC and Big Data, Handling, formatting,storage Data access, extraction, reduction Platforms FEM Models DEM Models LB Models End-user testing Usability verification Reactivity Spain ATOS CIMNE United Kingdom UNEDIN Norway SINTEF JOTNE France INRIA Germany FRAUNHOFER Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-4

Motivations of the project VELaSSCo Pre- processing Calculation Post- processing Geometry description Preparation of analysis data Visualizationof results Computer Analysis Pre and post-processor Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-5

Motivations of the project The simulation data are naturally linked to: High Performance Computing Simulation has already been introduced in Big Data area very traditional supercomputer manufacturers such as SGI companies oriented to massive number of customers such as Amazon, offering very attractive solutions for simulation software vendors (Elastic Compute Cloud, EC2, Simple Storage Service, S3) well-known simulation suites such as Matlab or OpenFOAM (precisely through Amazon services) How Big is the current Simulation Data? Some examples include: weather & climate (400 PB/year, now) nuclear & fusion energy (2PB/time step, now, and 200 PB/time step by 2020) high-energy physics, Materials, Chemistry, Biology, fluid dynamics Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-6

Objectives of the project Target data DEM FEM Total size 50 GB à 1 PB 30 GB à 50 TB Partitions 1 à 10,000 Particles / elements 10 million 8 million à 1 billion Time-steps 1 billion 40 à 25,000 Variables per node 10 variables 2-8 scalars, 1-2 vectors,?1 tensor? Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-7

Objectives of the project Nowadays the huge amount of data provided by the solver in HPC cannot be stored in one single machine, so it is mandatory: Distributed post-processing Distributed visualization Problems if a calculation node fails in HPC. Need a redundancy for the data Big Data The main objective of VELaSSCo project is to build the VELaSSCo Platform, a system that performs distributed post-processing operations and visualization of very large simulations. To address this objective, VELaSSCo brings together Simulation and Big Data. Develop a platform which targeted most of IT system Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-8

The VELaSSCo architecture Big Data, what does it mean? Big Data refers to data sets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze. (McKinsey Global Institute) Big Data is the term for a collection of data sets so large and complex that it becomes difficult to process using onhand database management tools or traditional data processing applications. (Wikipedia) Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-9

The VELaSSCo architecture Data of Big Data Usages: Data cleaning Data transformation Data analysis Data search Data computation Data visualization Heterogeneous data: From sensors Images Medias Textual Networks Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-10

The VELaSSCo architecture What are the challenges of Big Data Scale Data volume Distribution of computation and storage between different locations Size of network and storage system Complexity A wide variety of acquisitions A large set of dimensions Fuzzy data Heterogeneity Scientific collaboration between several domains Specific data format Complex workflow computation Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-11

The VELaSSCo architecture Grid vs Cloud Grids Owned by scientific community Batch computation Computation time Widely distributed Clouds Mainly owned by industry Simultaneous computations CPU time Can be distributed Heterogeneous system I. Foster, Y. Zhao, I. Raicu, and S. Lu. Cloud computing and grid computing 360-degree compared. In Grid Computing Environments Workshop, 2008, Nov 2008. Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-12

The VELaSSCo architecture Grid vs Cloud Grids Clouds Business Model Project-oriented Consomption basis Architecture Five layers Four layers Abstract resources Can be implement over a grid Resources Management Batch-scheduling Shared file system Shared by all users Specific FS Programming Model Workflow tools Map-Reduce Application Model Any Any Difficulties with HPC problems Security Model Strict security Strong security Foster, Y. Zhao, I. Raicu, and S. Lu. Cloud computing and grid computing 360-degree compared. In Grid Computing Environments Workshop, 2008. GCE 08, pages 1 10, Nov 2008. Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-13

The VELaSSCo architecture Big Data needs a distributed system The most suitable computational model for Big Data: MapReduce Designed for large distributed system A simple programming model Based on a specific FS Designed to scale up High availability Deal with nodes failure Batch computation But this model has evolved (Hadoop 2.0) More complex computation Management of Resources A Data-oriented Operating System Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-14

The VELaSSCo architecture Simplified version or or or Or.. Visualisation Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-15

The VELaSSCo architecture VELaSSCo.Platform.Access.lib Visualization client VELaSSCo.Engine.Layer (YARN) Query Manager Module Asynchronous Availability, resources, load, etc. Monitoring Graphics Compressi on / Streaming GPU struct Analytics LOD, D2C Iso, stream, stats VELaSSCo.Data.Layer RT Storage Module Batch Data Query VELaSSCo.Platform.DataIngestion.lib Simulation Ingestion & Processing (Flume) HBas hive FS Phoeni e x HadoopAbstractFileSystem Existing software To develop HDFS NFS EDM Plug-in EDM Results / data flow Consortium Open Queries flow Commercial Version Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-16

Conclusions A Big Data platform for engineering data (FEM and DEM simulation) with supports of visualisationtools: GID (CIMNE) ifx (Fraunhofer IGD) With support of real-time queries A big data architecture for any IT systems For ex: co-exists with a HPC cluster Extensible (support plug-ins) A database engine, based on widely used technologies such as Hadoop-HBase and ISO 10303 STEP, that can organise and store a diverse range of largescale simulation data sets for collaborative use. An innovative approach, adopting big data best practices, to handle large scale simulation data sets that have to be stored on multiple servers. A framework equipped with advanced in-situ processing tools to analyse the output of parallel distributed simulation solvers. An analysis platform to analyse and visualize large-scale data sets interactively. This builds on leading edge graphics hardware. Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-17

Thanks you for your attention. More information are available on http://www.velassco.eu You can contact me at: benoit.lange@inria.fr Benoit Lange - VELaSSCo - KGT2 - benoit.lange@inria.fr - Xi'An 7 May 2015-18