Bringing Compute to the Data Alternatives to Moving Data. Part of EUDAT s Training in the Fundamentals of Data Infrastructures

Size: px
Start display at page:

Download "Bringing Compute to the Data Alternatives to Moving Data. Part of EUDAT s Training in the Fundamentals of Data Infrastructures"

Transcription

1 Bringing Compute to the Data Alternatives to Moving Data Part of EUDAT s Training in the Fundamentals of Data Infrastructures

2 Introduction Why consider alternatives? The traditional approach Alternative approaches: Distributed Computing Workflows Bringing the Compute to the Data

3 Why should alternative approaches be considered? Moving data is still hard, even when you re using the right tools. Data volumes are expected to continue to increase, and this is expected to happen more rapidly then increases in transfer speeds Alternatives require thinking about things differently, so it may be wise to start thinking about alternatives before current techniques break down

4 Traditional Approach Input data is stored at location A Compute resource is at location B Output data is required at location C 1. Move data from A to B 2. Perform computation at B 3. Move data from B to C (A & C are often the same place) A B C

5 Traditional approach Data Compute

6 Alternative Approaches: A Disclaimer None of the following approaches provide a silver bullet! Not all approaches will be useful for all problems and in some case, using these approaches can make things worse These should complement existing approaches and be used where appropriate

7 Distributed Computing Here, the idea is that you might not need to do all of the compute at B. In general, this approach could make things worse, depending on your data transfer pattern It will not be suitable for all kinds of problem Many of the considerations here are traditional parallel computing concepts

8 Distributed Computing as Parallel Computing Is the problem trivially parallel? Is it possible to solve parts of the problem using only part of the input data, and simply recombine the output at the end of a run? If all processors have access to all the data at the start, is it then possible for them to proceed with little or no communication during the runs? If there is the need to communicate during a run, how intensive are these communications? Do you have all-to-alls?

9 When might Distributed Computing be a good alternative? When input data starts off distributed Fairly common with large scale experimental data: Sensors, detectors, etc. When input data is already mirrored When you ve had to move the data before anyway and you could have moved it to multiple places instead of just one When the computation is trivially parallelisable or requires only limited communication

10 A B1 B2 B3 B4 C

11 A1 A2 A3 A4 B1 B2 B3 B4 C

12 A1 A2 A3 A4 B1 B2 B3 B4 C

13 A1 B1 B2 C

14 A1 A2 B1 B2 B3 C

15 Is this Grid Computing? There are definite overlaps between these ideas of distributed computing and the grid computing that promised so much in the last decade Grid is not such a cool topic anymore, but many of the ideas could be reused in different contexts (possibly hidden from an end-user) This way of computing may still come into its own for certain kinds of big data problems

16 Scientific Computation in the cloud? Likely to be a while before this can get close to existing approaches in terms of efficiency, but it is being used in some places e.g. Amazon has Cluster Compute and Cluster GPU instances (see Some data sets are already in the cloud, e.g. Annotated Human Genome Data provided by ENSEMBL Various US Census Databases from The US Census Bureau UniGene provided by the National Center for Biotechnology Information Freebase Data Dump from Freebase.com

17 Big Input Data Likely to become more common as more and more data is stored and available for re-use Projects like EUDAT will make it easier to access to stored data This will be the case for much data-intensive science Where here I use this term in the context of the fourth paradigm : computers as datascopes

18 Workflows Related to distributed computing Sometimes referred to as programming in the large Again, this potentially requires more data movement The idea is to break the computation down so that some of it can be done at A, some of it can be done at B, and some of it can be done at C. Also, instead of doing everything at B, this could instead be done at B1, B2, B3, B4,

19 Simple Motivating Example Big Input Data A B Small Output Data C

20 A A B1 B B2 C C

21 or a more realistic case? Image Source:

22 Difficulties with this approach Change to computation algorithm likely A trade-off, but it might only need to be done once Orchestration Coordinating computation at multiple sites Workflows can help with this Can help to address the added complexities of Multiple jurisdictions / access policies Job scheduling Automation

23 Approaches to orchestration Local Each compute service works independently Data can be pushed or pulled between services (or some combination) The route that the data should take can be passed with the data predetermined at the service communicated manually to the service for each run Orchestrated The usual workflow approch A workflow engine communicates with services or processing elements to control data flow

24 An aside: Push & Pull Push Service 1 completes processing. Service 1 makes a call to service 2 and sends the data to service 2 The arrival of data triggers service 2 to run Pull Service 1 runs and stores its output locally Service 2 runs (triggered manually) Service 2 initiates data transfer from service 1 Service 1 Service 1 Service 2 Service 2

25 Workflow Engines Scientific Workflows Kepler, Taverna, Triana, Pegasus (Condor), VisTrails Unicore, OGSA-DAI (for database-oriented flows) General Purpose / Business Orientated Service Oriented Architecture Solutions BPEL engines, e.g., Oracle BPEL Process Manager SAP Exchange Infrastructure WebSphere Process Server Many of these based on web services Datacentre orientated Hadoop (MapReduce), Storm (stream processing)

26 Moving the Compute to the Data A more general idea which is related to both the previous approaches This approach relies to some extent on having an infrastructure that supports this approach Can work particularly well where A and C are the same place

27 Computing Close To The Data Relational Database Systems Send a query as SQL Virtual Machines Send a VM image to a virtualisation environment on a machine which can directly mount the data Allow a user to submit a script or executable on a machine close to the data SPARQL endpoints on RDF triple stores Data Services (e.g. as Web Services) with some API beyond file transfer Prefiltering / transformation / subsetting Application As A Service

28 Implications for Data Centres These approaches rely on data centres to provide computational resources and services Cons: Interface required to accept query or compute job Compute/processing resources required Pros: Less strain on the network

29 Conclusions Data movement will always be required Moving large amounts of data is never likely to be easy There is not one single solution, but by considering alternative approaches to big data problems may help you to solve problems and answer questions that would have otherwise been impossible

30 Acknowledgements These slides were produced by Adam Carter (EPCC, The University of Edinburgh) as part of the EUDAT project (www.eudat.eu) 2014 The University of Edinburgh You are welcome to re-use these slides under the terms of CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/)

SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION

SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION Kirandeep Kaur Khushdeep Kaur Research Scholar Assistant Professor, Department Of Cse, Bhai Maha Singh College Of Engineering, Bhai Maha Singh

More information

San Diego Supercomputer Center, UCSD. Institute for Digital Research and Education, UCLA

San Diego Supercomputer Center, UCSD. Institute for Digital Research and Education, UCLA Facilitate Parallel Computation Using Kepler Workflow System on Virtual Resources Jianwu Wang 1, Prakashan Korambath 2, Ilkay Altintas 1 1 San Diego Supercomputer Center, UCSD 2 Institute for Digital Research

More information

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies Lecture 8 Cloud Programming & Software Environments Part 1 of 2 Spring 2013 A Specialty Course for Purdue University s M.S. in Technology

More information

Azure Data Lake Analytics

Azure Data Lake Analytics Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data

More information

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training

More information

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing Andre Luckow, Peter M. Kasson, Shantenu Jha STREAMING 2016, 03/23/2016 RADICAL, Rutgers, http://radical.rutgers.edu

More information

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild. Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Workflow Tools at NERSC. Debbie Bard djbard@lbl.gov NERSC Data and Analytics Services

Workflow Tools at NERSC. Debbie Bard djbard@lbl.gov NERSC Data and Analytics Services Workflow Tools at NERSC Debbie Bard djbard@lbl.gov NERSC Data and Analytics Services NERSC User Meeting August 13th, 2015 What Does Workflow Software Do? Automate connection of applications Chain together

More information

Data Mining in the Swamp

Data Mining in the Swamp WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all

More information

Data Services @neurist and beyond

Data Services @neurist and beyond s @neurist and beyond Siegfried Benkner Department of Scientific Computing Faculty of Computer Science University of Vienna http://www.par.univie.ac.at Department of Scientific Computing Parallel Computing

More information

Industry 4.0 and Big Data

Industry 4.0 and Big Data Industry 4.0 and Big Data Marek Obitko, mobitko@ra.rockwell.com Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and

More information

Microsoft Research Windows Azure for Research Training

Microsoft Research Windows Azure for Research Training Copyright 2013 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the

More information

Real Time Big Data Processing

Real Time Big Data Processing Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Microsoft Research Microsoft Azure for Research Training

Microsoft Research Microsoft Azure for Research Training Copyright 2014 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the

More information

Final Project Proposal. CSCI.6500 Distributed Computing over the Internet

Final Project Proposal. CSCI.6500 Distributed Computing over the Internet Final Project Proposal CSCI.6500 Distributed Computing over the Internet Qingling Wang 660795696 1. Purpose Implement an application layer on Hybrid Grid Cloud Infrastructure to automatically or at least

More information

Cloud Computing. Summary

Cloud Computing. Summary Cloud Computing Lecture 1 2011-2012 https://fenix.ist.utl.pt/disciplinas/cn Summary Teaching Staff. Rooms and Schedule. Goals. Context. Syllabus. Reading Material. Assessment and Grading. Important Dates.

More information

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

Energy efficiency in HPC :

Energy efficiency in HPC : Energy efficiency in HPC : A new trend? A software approach to save power but still increase the number or the size of scientific studies! 19 Novembre 2012 The EDF Group in brief A GLOBAL LEADER IN ELECTRICITY

More information

pomsets: Workflow management for your cloud

pomsets: Workflow management for your cloud : Workflow management for your cloud Michael J Pan Nephosity 20 April, 2010 : Workflow management for your cloud Definition Motivation Issues with workflow management + grid computing Workflow management

More information

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India 1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013 Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Contents. Preface Acknowledgements. Chapter 1 Introduction 1.1

Contents. Preface Acknowledgements. Chapter 1 Introduction 1.1 Preface xi Acknowledgements xv Chapter 1 Introduction 1.1 1.1 Cloud Computing at a Glance 1.1 1.1.1 The Vision of Cloud Computing 1.2 1.1.2 Defining a Cloud 1.4 1.1.3 A Closer Look 1.6 1.1.4 Cloud Computing

More information

2015 The MathWorks, Inc. 1

2015 The MathWorks, Inc. 1 25 The MathWorks, Inc. 빅 데이터 및 다양한 데이터 처리 위한 MATLAB의 인터페이스 환경 및 새로운 기능 엄준상 대리 Application Engineer MathWorks 25 The MathWorks, Inc. 2 Challenges of Data Any collection of data sets so large and complex

More information

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08

More information

BIG DATA SOLUTION DATA SHEET

BIG DATA SOLUTION DATA SHEET BIG DATA SOLUTION DATA SHEET Highlight. DATA SHEET HGrid247 BIG DATA SOLUTION Exploring your BIG DATA, get some deeper insight. It is possible! Another approach to access your BIG DATA with the latest

More information

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad Cloud Computing: Computing as a Service Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad Abstract: Computing as a utility. is a dream that dates from the beginning from the computer

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013 Big Data Use Case How Rackspace is using Private Cloud for Big Data Bryan Thompson May 8th, 2013 Our Big Data Problem Consolidate all monitoring data for reporting and analytical purposes. Every device

More information

TRAINING PROGRAM ON BIGDATA/HADOOP

TRAINING PROGRAM ON BIGDATA/HADOOP Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,

More information

Hadoop vs Apache Spark

Hadoop vs Apache Spark Innovate, Integrate, Transform Hadoop vs Apache Spark www.altencalsoftlabs.com Introduction Any sufficiently advanced technology is indistinguishable from magic. said Arthur C. Clark. Big data technologies

More information

Using Hadoop, Cloud and Tiered Storage For Peak Performance

Using Hadoop, Cloud and Tiered Storage For Peak Performance Using Hadoop, Cloud and Tiered Storage For Peak Performance Presented by: David Gorbet, Vice President, Engineering, MarkLogic Corporation AGILITY SLIDE: 2 Local Disk SAN NAS SLIDE: 3 TIERED STORAGE ELASTICITY

More information

Challenges for cloud software engineering

Challenges for cloud software engineering Challenges for cloud software engineering Ian Sommerville St Andrews University Why is cloud software engineering different or is it? What needs to be done to make cloud software engineering easier for

More information

Big Data and the Cloud Trends, Applications, and Training

Big Data and the Cloud Trends, Applications, and Training Big Data and the Cloud Trends, Applications, and Training Stavros Christodoulakis MUSIC/TUC Lab School of Electronic and Computer Engineering Technical University of Crete stavros@ced.tuc.gr Data Explosion

More information

SURVEY ON SCIENTIFIC DATA MANAGEMENT USING HADOOP MAPREDUCE IN THE KEPLER SCIENTIFIC WORKFLOW SYSTEM

SURVEY ON SCIENTIFIC DATA MANAGEMENT USING HADOOP MAPREDUCE IN THE KEPLER SCIENTIFIC WORKFLOW SYSTEM SURVEY ON SCIENTIFIC DATA MANAGEMENT USING HADOOP MAPREDUCE IN THE KEPLER SCIENTIFIC WORKFLOW SYSTEM 1 KONG XIANGSHENG 1 Department of Computer & Information, Xinxiang University, Xinxiang, China E-mail:

More information

Cloud Computing Trends

Cloud Computing Trends UT DALLAS Erik Jonsson School of Engineering & Computer Science Cloud Computing Trends What is cloud computing? Cloud computing refers to the apps and services delivered over the internet. Software delivered

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM An Overview Contents Contents... 1 BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM... 1 Program Overview... 4 Curriculum... 5 Module 1: Big Data: Hadoop

More information

Data Lab System Architecture

Data Lab System Architecture Data Lab System Architecture Data Lab Context Data Lab Architecture Astronomer s Desktop Web Page Cmdline Tools Legacy Apps User Code User Mgmt Data Lab Ops Monitoring Presentation Layer Authentication

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

Attaching Cloud Storage to a Campus Grid Using Parrot, Chirp, and Hadoop

Attaching Cloud Storage to a Campus Grid Using Parrot, Chirp, and Hadoop Attaching Cloud Storage to a Campus Grid Using Parrot, Chirp, and Hadoop Patrick Donnelly, Peter Bui, Douglas Thain Computer Science and Engineering University of Notre Dame pdonnel3@nd.edu pbui@nd.edu

More information

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India 3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India Call for Papers Cloud computing has emerged as a de facto computing

More information

The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project

The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project Alastair Duncan STFC Pre Coffee talk STFC July 2014 SCAPE Scalable Preservation Environments The

More information

Epimorphics Linked Data Publishing Platform

Epimorphics Linked Data Publishing Platform Epimorphics Linked Data Publishing Platform Epimorphics Services for G-Cloud Version 1.2 15 th December 2014 Authors: Contributors: Review: Andy Seaborne, Martin Merry Dave Reynolds Epimorphics Ltd, 2013

More information

Emerging Requirements and DBMS Technologies:

Emerging Requirements and DBMS Technologies: Emerging Requirements and DBMS Technologies: When Is Relational the Right Choice? Carl Olofson Research Vice President, IDC April 1, 2014 Agenda 2 Why Relational in the First Place? Evolution of Databases

More information

WSO2 Message Broker. Scalable persistent Messaging System

WSO2 Message Broker. Scalable persistent Messaging System WSO2 Message Broker Scalable persistent Messaging System Outline Messaging Scalable Messaging Distributed Message Brokers WSO2 MB Architecture o Distributed Pub/sub architecture o Distributed Queues architecture

More information

Managing large clusters resources

Managing large clusters resources Managing large clusters resources ID2210 Gautier Berthou (SICS) Big Processing with No Locality Job( /crawler/bot/jd.io/1 ) submi t Workflow Manager Compute Grid Node Job This doesn t scale. Bandwidth

More information

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Data-intensive HPC: opportunities and challenges. Patrick Valduriez Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,

More information

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

Tamás Budavári / The Johns Hopkins University

Tamás Budavári / The Johns Hopkins University PRACTICAL SCIENTIFIC ANALYSIS OF BIG DATA RUNNING IN PARALLEL / The Johns Hopkins University 2 Parallelism Data parallel Same processing on different pieces of data Task parallel Simultaneous processing

More information

WORKFLOW ENGINE FOR CLOUDS

WORKFLOW ENGINE FOR CLOUDS WORKFLOW ENGINE FOR CLOUDS By SURAJ PANDEY, DILEBAN KARUNAMOORTHY, and RAJKUMAR BUYYA Prepared by: Dr. Faramarz Safi Islamic Azad University, Najafabad Branch, Esfahan, Iran. Workflow Engine for clouds

More information

A Service for Data-Intensive Computations on Virtual Clusters

A Service for Data-Intensive Computations on Virtual Clusters A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent

More information

Grid Computing vs Cloud

Grid Computing vs Cloud Chapter 3 Grid Computing vs Cloud Computing 3.1 Grid Computing Grid computing [8, 23, 25] is based on the philosophy of sharing information and power, which gives us access to another type of heterogeneous

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Amazon Web Services. Luca Clementi clem@sdsc.edu Sriram Krishnan sriram@sdsc.edu. NBCR Summer Institute, August 2009

Amazon Web Services. Luca Clementi clem@sdsc.edu Sriram Krishnan sriram@sdsc.edu. NBCR Summer Institute, August 2009 Amazon Web Services Luca Clementi clem@sdsc.edu Sriram Krishnan sriram@sdsc.edu NBCR Summer Institute, August 2009 Introduction Outline Different type of hosting Cloud offering Amazon Web Service Offering

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Big Data at Cloud Scale

Big Data at Cloud Scale Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For

More information

FAQ. NetApp MAT4Shift. March 2015

FAQ. NetApp MAT4Shift. March 2015 i FAQ NetApp MAT4Shift March 2015 TABLE OF CONTENTS 1 General... 3 1.1 Solution Overview...3 What is NetApp MAT4Shift?... 3 What business needs does this solution address?... 3 What is the value of the

More information

Early Cloud Experiences with the Kepler Scientific Workflow System

Early Cloud Experiences with the Kepler Scientific Workflow System Available online at www.sciencedirect.com Procedia Computer Science 9 (2012 ) 1630 1634 International Conference on Computational Science, ICCS 2012 Early Cloud Experiences with the Kepler Scientific Workflow

More information

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,

More information

Cloud Computing Paradigm

Cloud Computing Paradigm Cloud Computing Paradigm Julio Guijarro Automated Infrastructure Lab HP Labs Bristol, UK 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

More information

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

More information

A Professional Big Data Master s Program to train Computational Specialists

A Professional Big Data Master s Program to train Computational Specialists A Professional Big Data Master s Program to train Computational Specialists Anoop Sarkar, Fred Popowich, Alexandra Fedorova! School of Computing Science! Education for Employable Graduates: Critical Questions

More information

CLOUD COMPUTING USING HADOOP TECHNOLOGY

CLOUD COMPUTING USING HADOOP TECHNOLOGY CLOUD COMPUTING USING HADOOP TECHNOLOGY DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY SALEM B.NARENDRA PRASATH S.PRAVEEN KUMAR 3 rd year CSE Department, 3 rd year CSE Department, Email:narendren.jbk@gmail.com

More information

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber Introduction to grid technologies, parallel and cloud computing Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber OUTLINES Grid Computing Parallel programming technologies (MPI- Open MP-Cuda )

More information

How to Do/Evaluate Cloud Computing Research. Young Choon Lee

How to Do/Evaluate Cloud Computing Research. Young Choon Lee How to Do/Evaluate Cloud Computing Research Young Choon Lee Cloud Computing Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing

More information

WHAT S NEW IN SAS 9.4

WHAT S NEW IN SAS 9.4 WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support

More information

Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed

Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed Sébastien Badia, Alexandra Carpen-Amarie, Adrien Lèbre, Lucas Nussbaum Grid 5000 S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum

More information

Getting Started with Database As a Service on OpenStack

Getting Started with Database As a Service on OpenStack White Paper Getting Started with Database As a Service on OpenStack Today s Database Management Challenges The last decade of computing technologies have been dominated by the proliferation of virtualization

More information

Cloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A

Cloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A Identifier: Date: Activity: Authors: Status: Link: Cloud-pilot.doc 12-12-2010 SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A J O I N T A C T I O N ( S A 1, J R A 3 ) F I

More information

Sriram Krishnan, Ph.D. sriram@sdsc.edu

Sriram Krishnan, Ph.D. sriram@sdsc.edu Sriram Krishnan, Ph.D. sriram@sdsc.edu (Re-)Introduction to cloud computing Introduction to the MapReduce and Hadoop Distributed File System Programming model Examples of MapReduce Where/how to run MapReduce

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams Neptune A Domain Specific Language for Deploying HPC Software on Cloud Platforms Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams ScienceCloud 2011 @ San Jose, CA June 8, 2011 Cloud Computing Three

More information

Basic Scheduling in Grid environment &Grid Scheduling Ontology

Basic Scheduling in Grid environment &Grid Scheduling Ontology Basic Scheduling in Grid environment &Grid Scheduling Ontology By: Shreyansh Vakil CSE714 Fall 2006 - Dr. Russ Miller. Department of Computer Science and Engineering, SUNY Buffalo What is Grid Computing??

More information

Clusterix Dynamic Clusters Administration Tutorial

Clusterix Dynamic Clusters Administration Tutorial Clusterix Dynamic Clusters Administration Tutorial Marcin Pawlik , Jan Kwiatkowski IIS, WIZ, PWr Tutorial outline Clusterix and dynamic clusters

More information

HPC technology and future architecture

HPC technology and future architecture HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr

More information

Cloud Computing An Elephant In The Dark

Cloud Computing An Elephant In The Dark Cloud Computing An Elephant In The Dark Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Cloud Computing 1394/2/7 1 / 60 Amir

More information

DevOps with Containers. for Microservices

DevOps with Containers. for Microservices DevOps with Containers for Microservices DevOps is a Software Development Method Keywords Communication, collaboration, integration, automation, measurement Goals improved deployment frequency faster time

More information

Scaling in the Cloud with AWS. By: Eli White (CTO & Co-Founder @ mojolive) eliw.com - @eliw - mojolive.com

Scaling in the Cloud with AWS. By: Eli White (CTO & Co-Founder @ mojolive) eliw.com - @eliw - mojolive.com Scaling in the Cloud with AWS By: Eli White (CTO & Co-Founder @ mojolive) eliw.com - @eliw - mojolive.com Welcome! Why is this guy talking to us? Please ask questions! 2 What is Scaling anyway? Enabling

More information

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043 INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 0 INFORMATION TECHNOLOGY TUTORIAL QUESTION BANK Name : Cloud computing Code : A60519 Class : III B. Tech II Semester Branch

More information

Eucalyptus-Based. GSAW 2010 Working Group Session 11D. Nehal Desai

Eucalyptus-Based. GSAW 2010 Working Group Session 11D. Nehal Desai GSAW 2010 Working Group Session 11D Eucalyptus-Based Event Correlation Nehal Desai Member of the Tech. Staff, CSD/CSTS/CSRD, The Aerospace Corporation Dr. Craig A. Lee, lee@aero.org Senior Scientist, CSD/CSTS/CSRD,

More information

Build a Streamlined Data Refinery. An enterprise solution for blended data that is governed, analytics-ready, and on-demand

Build a Streamlined Data Refinery. An enterprise solution for blended data that is governed, analytics-ready, and on-demand Build a Streamlined Data Refinery An enterprise solution for blended data that is governed, analytics-ready, and on-demand Introduction As the volume and variety of data has exploded in recent years, putting

More information

HPC performance applications on Virtual Clusters

HPC performance applications on Virtual Clusters Panagiotis Kritikakos EPCC, School of Physics & Astronomy, University of Edinburgh, Scotland - UK pkritika@epcc.ed.ac.uk 4 th IC-SCCE, Athens 7 th July 2010 This work investigates the performance of (Java)

More information

High Throughput Computing, Grid Computing, Cloud Computing, Etc. Definitions & Thoughts. Some Definitions

High Throughput Computing, Grid Computing, Cloud Computing, Etc. Definitions & Thoughts. Some Definitions High Throughput Computing, Grid Computing, Cloud Computing, Etc. Definitions & Thoughts Jay Boisseau Texas Advanced Computing Center July 16, 2008 But before the definitions, some disclaimers: These my

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

Evaluating MapReduce and Hadoop for Science

Evaluating MapReduce and Hadoop for Science Evaluating MapReduce and Hadoop for Science Lavanya Ramakrishnan LRamakrishnan@lbl.gov Lawrence Berkeley National Lab Computation and Data are critical parts of the scientific process Three Pillars of

More information

Cloud Computing. Adam Barker

Cloud Computing. Adam Barker Cloud Computing Adam Barker 1 Overview Introduction to Cloud computing Enabling technologies Different types of cloud: IaaS, PaaS and SaaS Cloud terminology Interacting with a cloud: management consoles

More information

Grid Computing Vs. Cloud Computing

Grid Computing Vs. Cloud Computing International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 577-582 International Research Publications House http://www. irphouse.com /ijict.htm Grid

More information

Convergence of Big Data and Cloud

Convergence of Big Data and Cloud American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-03, Issue-05, pp-266-270 www.ajer.org Research Paper Open Access Convergence of Big Data and Cloud Sreevani.Y.V.

More information

IAN MASSINGHAM. Technical Evangelist Amazon Web Services

IAN MASSINGHAM. Technical Evangelist Amazon Web Services IAN MASSINGHAM Technical Evangelist Amazon Web Services From 2014: Cloud computing has become the new normal Deploying new applications to the cloud by default Migrating existing applications as quickly

More information

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT

More information

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013 Virtualisation Cloud Computing at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013 Virtualisation @ RAL Context at RAL Hyper-V Services Platform Scientific Computing Department

More information

BIG DATA USING HADOOP

BIG DATA USING HADOOP + Breakaway Session By Johnson Iyilade, Ph.D. University of Saskatchewan, Canada 23-July, 2015 BIG DATA USING HADOOP + Outline n Framing the Problem Hadoop Solves n Meet Hadoop n Storage with HDFS n Data

More information