Life sciences big data e-infrastructure concepts

Similar documents
SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop

Recap& What%is% big%data % Introduction%to%e4science% Reflections%of% big%data %in%health% Guest%lecture:%% DB%for%big%data%analytics%

Software Defined RON TROMPERT

Building a Top500-class Supercomputing Cluster at LNS-BUAP

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

How to Deploy OpenStack on TH-2 Supercomputer Yusong Tan, Bao Li National Supercomputing Center in Guangzhou April 10, 2014

SURFsara Data Services

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

What is the real cost of Commercial Cloud provisioning? Thursday, 20 June 13 Lukasz Kreczko - DICE 1

Overview of HPC Resources at Vanderbilt

Hadoop on the Gordon Data Intensive Cluster

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Enabling High performance Big Data platform with RDMA

RO-11-NIPNE, evolution, user support, site and software development. IFIN-HH, DFCTI, LHCb Romanian Team

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Dell Reference Configuration for Hortonworks Data Platform

High Performance Computing OpenStack Options. September 22, 2015

An introduction to Fyrkat

Bright Cluster Manager

Scientific Computing Data Management Visions

Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

Cornell University Center for Advanced Computing

Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept

Clusters: Mainstream Technology for CAE

Big Data Technologies Compared June 2014

Scaling Out With Apache Spark. DTL Meeting Slides based on

VDI: What Does it Mean, Deploying challenges & Will It Save You Money?

HPC technology and future architecture

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

SGI High Performance Computing

Dutch HPC Cloud: flexible HPC for high productivity in science & business

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

Big Data Processing: Past, Present and Future

Parallel Programming Survey

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

How To Build A Cloud Stack For A University Project

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

Cloud Computing Where ISR Data Will Go for Exploitation

High Performance Computing (HPC)

Using Hadoop to Expand Data Warehousing

IT Survey Frank Dwyer Senior Director, Information Technology The Salk Institute, La Jolla, CA

Unlocking the Intelligence in. Big Data. Ron Kasabian General Manager Big Data Solutions Intel Corporation

Hadoop: Embracing future hardware

Estonian Scientific Computing Infrastructure (ETAIS)

RED HAT ENTERPRISE LINUX 7

StorPool Distributed Storage Software Technical Overview

Modernizing Your Data Warehouse for Hadoop

CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA)

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

HPC Cloud. Focus on your research. Floris Sluiter Project leader SARA

Oracle Big Data SQL Technical Update

Distributed File Systems An Overview. Nürnberg, Dr. Christian Boehme, GWDG

Cisco IT Hadoop Journey

Scaling out compute resources LOCAL (INTER)NATIONAL

Bright Cluster Manager

Big Data Analytics - Accelerated. stream-horizon.com

Building a Linux Cluster

Research E-Infrastructure Upgrade Project at IMCS UL

High Performance Computing and Big Data: The coming wave.

Big Data Performance Growth on the Rise

SCI Briefing: A Review of the New Hitachi Unified Storage and Hitachi NAS Platform 4000 Series. Silverton Consulting, Inc.

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

Object storage in Cloud Computing and Embedded Processing

IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO)

GTC Presentation March 19, Copyright 2012 Penguin Computing, Inc. All rights reserved

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router


CSE-E5430 Scalable Cloud Computing Lecture 2

I/O Considerations in Big Data Analytics

Building a Scalable Big Data Infrastructure for Dynamic Workflows

for my computation? Stefano Cozzini Which infrastructure Which infrastructure Democrito and SISSA/eLAB - Trieste

Large-Scale Data Processing

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

IBM System x SAP HANA

Achieving Performance Isolation with Lightweight Co-Kernels

Cluster Computing at HRI

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Open Cirrus: Towards an Open Source Cloud Stack

Architecture & Experience

Building an energy dashboard. Energy measurement and visualization in current HPC systems

DeIC Watson Agreement - hvad betyder den for DeIC medlemmerne

Intel Platform and Big Data: Making big data work for you.

Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed

Sun Constellation System: The Open Petascale Computing Architecture

Remote & Collaborative Visualization. Texas Advanced Compu1ng Center

Hortonworks Data Platform Reference Architecture

High Performance Computing in CST STUDIO SUITE

Parallel Computing with MATLAB

Transcription:

Life sciences big data e-infrastructure concepts Maarten Kooyman 2014-11-13 Tue Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 1 / 1

About me Knowing each other helps to communicate Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 2 / 1

Education Knowing each other helps to communicate Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 3 / 1

Work Knowing each other helps to communicate Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 4 / 1

Current Work Knowing each other helps to communicate Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 5 / 1

Cartesius the national supercomputer Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 6 / 1

Lisa the national compute cluster Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 7 / 1

Grid The grid infrastructure Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 8 / 1

Hadoop Big Data analytics framework Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 9 / 1

HPC Cloud The cloud computing infrastructure Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 10 / 1

Discussion And will continue during coffee Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 11 / 1

Credits Profile by Marek Polakovic from The Noun Project Graduate by Wilson Joseph from The Noun Project Graduate by Wilson Joseph from The Noun Project User by Wilson Joseph from The Noun Project Superhero by Moriah Rich from The Noun Project Cow by Alessandro Suraci from The Noun Project Grid by Sblendone from The Noun Project divide by Lorena Salagre from The Noun Project Adventure by Ben Markoch from The Noun Project Cloud by Lil Squid from The Noun Project All icons on this list are licensed under Creative Commons Attribution Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 12 / 1

Cartesius the national supercomputer Communicate fast between jobs 15008 cores 132 GPU 2.6 or * GB/core low latency network Coming soon! GPU2GPU direct communication More nodes with AVX2 instructions System grows over time Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 13 / 1

Lisa the national compute cluster Large simple cluster: 8960 cores 3.5 GB/core NFS home drive Coming soon! Intel Xeon Phi unified data storage with most SURFsara systems Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 14 / 1

Grid The grid infrastructure Massive embarrassingly parallel calculations. Under SURFsara administration cluster cores memory LSG 1350 cores 4GB Gina 3024 cores 4 or 8 GB filesystem: fast local storage (few TB) gigantic global storage (used 5.5 PB) Available cores on lsgrid VO 17590 (SURFsara/ NIHEF/ RUG-CIT) Upscaling European Grid Infrastructure (EGI) possible Coming soon! Newer nodes(avx2, 8GB Mem/core) New documentation CernVM-FS Research: Docker on grid Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 15 / 1

Hadoop Big Data analytics framework Divide and Conquer: distributed data and calculations 720 cores 8GB/core 600TB cluster filesystem Coming soon! Roughly double the amount of nodes More than double the storage HBase as production service Hortonworks 1.3 ElasticSearch Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 16 / 1

HPC Cloud the cloud computing infrastructure Maximum flexibility: be your own admin! 1280 cores 8 GB memory filesystem: NFS Coming soon! Big Memory node (2TB) GPU s OpenNebula 4.x 3 different filesystems (NFS, Ceph, local) Beefier NFS servers SSD for some local space Research: optimise virtualisation switches for KVM Maarten Kooyman Life sciences big data e-infrastructure concepts 2014-11-13 Tue 17 / 1