Big Data and Cloud Computing for GHRSST

Size: px
Start display at page:

Download "Big Data and Cloud Computing for GHRSST"

Transcription

1 Big Data and Cloud Computing for GHRSST Jean-Francois Piollé Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer

2 Facing data deluge Today s LTSRF archive : 49 TB Increasing number of operational satellites, forthcoming Chinese / Indian programs increasing sensor spatial and temporal resolution Challenges How to allow high revisiting rate of historical (and present) data? How to perform data intensive processing? How to afford large online archive? How to transfer data to user? How to store locally data? Storage bottleneck Processing bottleneck Network bottleneck Can new big data and cloud computing technologies help with that?

3 How to cope with data volume? usage of high-resolution data ok for case studies : limited amount of data current solution for long time series : generation of high-level fusion products (L3 / L4) involves data transformation : averaging, smoothing,. suitable for some applications only what about more data intensive applications? highest spatial and temporal resolution feature detection (front, eddies,.) data merging and synergy Are massive central static and one-way archive centers still relevant? Data center Data Tbytes! User Processings Mbytes

4 How to cope with data volume? usage of high-resolution data ok for case studies : limited amount of data current solution for long time series : generation of high-level fusion products (L3 / L4) involves data transformation : averaging, smoothing,. suitable for some applications only what about more data intensive applications? highest spatial and temporal resolution feature detection (front, eddies,.) data merging and synergy Are massive central static and one-way archive centers still relevant? Data center Data Tbytes! User Processings Mbytes

5 Main aspects to consider Data analysis User services Cloud computing Virtualization Workflow management Data organization and format File system Storage (hardware) Big data : very confusing term how to deal with data volume growth and complexity to extract fast and relevant information? Approach to design and strategies for large volume of data Issues with data management, organization, storage, processing Cloud computing : also very confusing In our context, offering flexible remote processing capability Virtualization + dynamic allocation of resources

6 storage Online storage on disk required for archives Restoration from tapes : 500 GB / day What technologies considered? Big data centers (Google, Facebook, ) rely on cheap hardware => weaker reliability balanced by duplication/redundancy Strongly inter-related with file system (ex : management of redundancy, distribution, ) Connection strategy with processing nodes to be considered (data intensive architecture taking into account data topology for job distribution «closest to the data») processing and network performances while keeping low budget

7 File systems Parallel and distributed Large volume : disk cluster seen as one virtual space Lustre MooseFS Simple administration Scalability Reliability and robustness (redundancy implemented through replicates, and soon parity bit) Complex administration (scalability, ) No redundancy Bad fault tolerance No quota (soon) GlusterFS Complex maintenance and administration Bad reliability Not suitable for large number of files HDFS (Hadoop) Performance for streaming and massive distributed processing Requires specific API for data access Hadoop optimized for key/value data structure, not image/swath type structure

8 Cloud computing Providing remote access and resources to users Previous solutions : Ssh to server : limited allocation of resource (unmanageable), strong security issues Ssh to supercomputer Expensive solution for data intensive applications (no communication between processing nodes) Strong environment constraints => specific system/software/libs/ Often not at the same location than data centers Grid technology Quite complex to use Strong environment constraints => specific system/software/libs/

9 Cloud computing Virtualization => deploy user dedicated and customized system environment (os and libraries, softwares,...) => remote machine close to user familiar environment Cloud computing => management of ressources, allocation/deployment of virtual servers IAAS : infrastructure PAAS : platform (server + tools for processor integration, scheduling or reprocessing taks,...) SAAS : software => sustainability of processing environments Private/public clouds => public clouds (Amazon S3,...) : expensive to be revised according to Ken), not adapted for large volume of data, concerns with sustainability => private cloud : restricted to within institute => hybrid clouds : private cloud with controlled access for external users. Security issues to be solved.

10 CERSAT Nephelae platform Data analysis User services OpenStack, inherited from Nebula tried also Eucalyptus Cloud computing Virtualization Workflow management Data organization and format File system Storage (hardware) OpenStack, inherited from Nebula. Access through ssh. Possible remote desktop with tried also Eucalyptus KVM Ubuntu / Cent-OS w/ Matlab, scientific python PBS Pro Torque Maui data topology not taken into account netcdf4 conversion effort for existing datasets 15.8 TB for GHRSST Moose FS full replication 400 TB 414 Cores

11 Feedback and experiences Engineering perspective 1. Cost of commercial solutions and lack of optimization of storage vs processing strategy 2. reliability of file systems (not to loose any data) is variable depending on the file system. Longer assessment (and mistakes) is needed. 3. virtualization and input/output performances : drop by 50 %, about to be solved 4. still completely to be addressed : using storage topology to distribute processing to closest node 5. access security issues for external opening 6. stability status of most components, lack of documentation 7. lack of available expertise for our specific requirements

12 Feedback and experiences Usage perspective 1. Used for reprocessing campaigns : => deployment of external partner's processor on platform matching developer's requirements and reprocessing also allowed to save the processing environments and replay some part of the reprocessing later in the exact same conditions Continuous re-processing capabality 2. Sandbox for various project contributors using and sharing the same data => product intercomparison and merging => test of new algorithms, perturbating initial conditions or settings 3. Systematic analysis of a dataset => detection of features in SST images => conversion to NetCDF4 Great help of the batch processing tools we have implemented (take a list of data files as input)

13 Questions for GHRSST These technologies are quite new and unstable. Limited real expertise is available, technical challenges are yet to be tackled especially for scientific data but many initiatives are popping up (physics, space agencies, ). Is it a new paradigm for data centers? Will only help with some applications : not an answer to everything (traditional technologies still works)! Complementary tool to current data center services GHRSST should be concerned about the capability building around its data heritage and the user services for the exploitation of past data => from user perspective (not data producer) What are the experience and prospects at GHRSST main data nodes (PODAAC, NODC)? => Necessary to share and possibly homogenize or interconnect the available services Should these aspects be part of GHRSST strategic plan?

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud) Open Cloud System (Integration of Eucalyptus, Hadoop and into deployment of University Private Cloud) Thinn Thu Naing University of Computer Studies, Yangon 25 th October 2011 Open Cloud System University

More information

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop SURFsara HPC Cloud Workshop doc.hpccloud.surfsara.nl UvA workshop 2016-01-25 UvA HPC Course Jan 2016 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current

More information

Cultural Heritage Institutions, Metadata Aggregators and The Cloud Aleksandra Nowak, Marcin Werla Poznań Supercomputing and Networking Center

Cultural Heritage Institutions, Metadata Aggregators and The Cloud Aleksandra Nowak, Marcin Werla Poznań Supercomputing and Networking Center Cultural Heritage Institutions, Metadata Aggregators and The Cloud Aleksandra Nowak, Marcin Werla Poznań Supercomputing and Networking Center ECloud and LoCloud are funded by the European Commission's

More information

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM Ramesh Maharjan and Manoj Shakya Department of Computer Science and Engineering Dhulikhel, Kavre, Nepal lazymesh@gmail.com,

More information

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti

More information

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study DISTRIBUTED SYSTEMS AND CLOUD COMPUTING A Comparative Study Geographically distributed resources, such as storage devices, data sources, and computing power, are interconnected as a single, unified resource

More information

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop SURFsara HPC Cloud Workshop www.cloud.sara.nl Tutorial 2014-06-11 UvA HPC and Big Data Course June 2014 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current

More information

Cloud computing - Architecting in the cloud

Cloud computing - Architecting in the cloud Cloud computing - Architecting in the cloud anna.ruokonen@tut.fi 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices

More information

BIG DATA USING HADOOP

BIG DATA USING HADOOP + Breakaway Session By Johnson Iyilade, Ph.D. University of Saskatchewan, Canada 23-July, 2015 BIG DATA USING HADOOP + Outline n Framing the Problem Hadoop Solves n Meet Hadoop n Storage with HDFS n Data

More information

Cluster, Grid, Cloud Concepts

Cluster, Grid, Cloud Concepts Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of

More information

Cloud Computing Backgrounder

Cloud Computing Backgrounder Cloud Computing Backgrounder No surprise: information technology (IT) is huge. Huge costs, huge number of buzz words, huge amount of jargon, and a huge competitive advantage for those who can effectively

More information

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Outline Introduction EO challenges; EO and classical/cloud computing; EO Services The computing platform Cluster -> Grid -> Cloud

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Apache Hadoop FileSystem and its Usage in Facebook

Apache Hadoop FileSystem and its Usage in Facebook Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs

More information

Cloud Computing Where ISR Data Will Go for Exploitation

Cloud Computing Where ISR Data Will Go for Exploitation Cloud Computing Where ISR Data Will Go for Exploitation 22 September 2009 Albert Reuther, Jeremy Kepner, Peter Michaleas, William Smith This work is sponsored by the Department of the Air Force under Air

More information

How To Understand Cloud Computing

How To Understand Cloud Computing Dr Markus Hagenbuchner markus@uow.edu.au CSCI319 Introduction to Cloud Computing CSCI319 Chapter 1 Page: 1 of 10 Content and Objectives 1. Introduce to cloud computing 2. Develop and understanding to how

More information

CLOUD COMPUTING. When It's smarter to rent than to buy

CLOUD COMPUTING. When It's smarter to rent than to buy CLOUD COMPUTING When It's smarter to rent than to buy Is it new concept? Nothing new In 1990 s, WWW itself Grid Technologies- Scientific applications Online banking websites More convenience Not to visit

More information

wu.cloud: Insights Gained from Operating a Private Cloud System

wu.cloud: Insights Gained from Operating a Private Cloud System wu.cloud: Insights Gained from Operating a Private Cloud System Stefan Theußl, Institute for Statistics and Mathematics WU Wirtschaftsuniversität Wien March 23, 2011 1 / 14 Introduction In statistics we

More information

Grid Computing Vs. Cloud Computing

Grid Computing Vs. Cloud Computing International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 577-582 International Research Publications House http://www. irphouse.com /ijict.htm Grid

More information

Comparing Ganeti to other Private Cloud Platforms. Lance Albertson Director lance@osuosl.org @ramereth

Comparing Ganeti to other Private Cloud Platforms. Lance Albertson Director lance@osuosl.org @ramereth Comparing Ganeti to other Private Cloud Platforms Lance Albertson Director lance@osuosl.org @ramereth About me OSU Open Source Lab Server hosting for Open Source Projects Open Source development projects

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined

More information

Comparing Open Source Private Cloud (IaaS) Platforms

Comparing Open Source Private Cloud (IaaS) Platforms Comparing Open Source Private Cloud (IaaS) Platforms Lance Albertson OSU Open Source Lab Associate Director of Operations lance@osuosl.org / @ramereth About me OSU Open Source Lab Server hosting for Open

More information

w w w. u l t i m u m t e c h n o l o g i e s. c o m Infrastructure-as-a-Service on the OpenStack platform

w w w. u l t i m u m t e c h n o l o g i e s. c o m Infrastructure-as-a-Service on the OpenStack platform w w w. u l t i m u m t e c h n o l o g i e s. c o m Infrastructure-as-a-Service on the OpenStack platform http://www.ulticloud.com http://www.openstack.org Introduction to OpenStack 1. What OpenStack is

More information

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Use of Hadoop File System for Nuclear Physics Analyses in STAR

Use of Hadoop File System for Nuclear Physics Analyses in STAR 1 Use of Hadoop File System for Nuclear Physics Analyses in STAR EVAN SANGALINE UC DAVIS Motivations 2 Data storage a key component of analysis requirements Transmission and storage across diverse resources

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

Cloud Computing. Chapter 1 Introducing Cloud Computing

Cloud Computing. Chapter 1 Introducing Cloud Computing Cloud Computing Chapter 1 Introducing Cloud Computing Learning Objectives Understand the abstract nature of cloud computing. Describe evolutionary factors of computing that led to the cloud. Describe virtualization

More information

Challenges for cloud software engineering

Challenges for cloud software engineering Challenges for cloud software engineering Ian Sommerville St Andrews University Why is cloud software engineering different or is it? What needs to be done to make cloud software engineering easier for

More information

Cloud Computing Paradigm

Cloud Computing Paradigm Cloud Computing Paradigm Julio Guijarro Automated Infrastructure Lab HP Labs Bristol, UK 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

More information

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, 2015. 20014 IBM Corporation

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, 2015. 20014 IBM Corporation Boas Betzler Cloud IBM Distinguished Computing Engineer for a Smarter Planet Globally Distributed IaaS Platform Examples AWS and SoftLayer November 9, 2015 20014 IBM Corporation Building Data Centers The

More information

Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS

Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS Workshop on the Future of Big Data Management 27-28 June 2013 Philip Kershaw Centre for Environmental Data Archival

More information

High Performance Computing (HPC)

High Performance Computing (HPC) High Performance Computing (HPC) High Performance Computing (HPC) White Paper Attn: Name, Title Phone: xxx.xxx.xxxx Fax: xxx.xxx.xxxx 1.0 OVERVIEW When heterogeneous enterprise environments are involved,

More information

Getting Started & Successful with Big Data

Getting Started & Successful with Big Data Getting Started & Successful with Big Data @Pentaho #BigDataWebSeries 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Your Hosts Today Davy Nys VP EMEA & APAC Pentaho Paul

More information

Cloud Computing. Cloud computing:

Cloud Computing. Cloud computing: Cloud computing: Cloud Computing A model of data processing in which high scalability IT solutions are delivered to multiple users: as a service, on a mass scale, on the Internet. Network services offering:

More information

JASMIN Cloud ESGF and UV- CDAT Conference 09-11 December 2014 STFC / Stephen Kill

JASMIN Cloud ESGF and UV- CDAT Conference 09-11 December 2014 STFC / Stephen Kill JASMIN Cloud ESGF and UV- CDAT Conference 09-11 December 2014 STFC / Stephen Kill Philip Kershaw (1, 2), Jonathan Churchill (5), Bryan Lawrence (1, 3, 4), Stephen Pascoe (1, 4) and MaE Pritchard (1) Centre

More information

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current

More information

Apache Hadoop FileSystem Internals

Apache Hadoop FileSystem Internals Apache Hadoop FileSystem Internals Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Storage Developer Conference, San Jose September 22, 2010 http://www.facebook.com/hadoopfs

More information

Part I Courses Syllabus

Part I Courses Syllabus Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment

More information

Building Storage Service in a Private Cloud

Building Storage Service in a Private Cloud Building Storage Service in a Private Cloud Sateesh Potturu & Deepak Vasudevan Wipro Technologies Abstract Storage in a private cloud is the storage that sits within a particular enterprise security domain

More information

Cloud Computing and Content Delivery Network use within Earth Observation Ground Segments: experiences and lessons learnt

Cloud Computing and Content Delivery Network use within Earth Observation Ground Segments: experiences and lessons learnt Cloud Computing and Content Delivery Network use within Earth Observation Ground Segments: experiences and lessons learnt J.Farres EOP-GS ESRIN 6/6/2012 Page 1 Agenda 1. Introduction 2. ESA Experiences

More information

Grid Computing vs Cloud

Grid Computing vs Cloud Chapter 3 Grid Computing vs Cloud Computing 3.1 Grid Computing Grid computing [8, 23, 25] is based on the philosophy of sharing information and power, which gives us access to another type of heterogeneous

More information

Putchong Uthayopas, Kasetsart University

Putchong Uthayopas, Kasetsart University Putchong Uthayopas, Kasetsart University Introduction Cloud Computing Explained Cloud Application and Services Moving to the Cloud Trends and Technology Legend: Cluster computing, Grid computing, Cloud

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

HPC technology and future architecture

HPC technology and future architecture HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Cloud Courses Description

Cloud Courses Description Cloud Courses Description Cloud 101: Fundamental Cloud Computing and Architecture Cloud Computing Concepts and Models. Fundamental Cloud Architecture. Virtualization Basics. Cloud platforms: IaaS, PaaS,

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Efficient Cloud Management for Parallel Data Processing In Private Cloud

Efficient Cloud Management for Parallel Data Processing In Private Cloud 2012 International Conference on Information and Network Technology (ICINT 2012) IPCSIT vol. 37 (2012) (2012) IACSIT Press, Singapore Efficient Cloud Management for Parallel Data Processing In Private

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

Introduction to OpenStack

Introduction to OpenStack Introduction to OpenStack Carlo Vallati PostDoc Reseracher Dpt. Information Engineering University of Pisa carlo.vallati@iet.unipi.it Cloud Computing - Definition Cloud Computing is a term coined to refer

More information

Design and Evolution of the Apache Hadoop File System(HDFS)

Design and Evolution of the Apache Hadoop File System(HDFS) Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

Task Scheduling in Hadoop

Task Scheduling in Hadoop Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed

More information

Sistemi Operativi e Reti. Cloud Computing

Sistemi Operativi e Reti. Cloud Computing 1 Sistemi Operativi e Reti Cloud Computing Facoltà di Scienze Matematiche Fisiche e Naturali Corso di Laurea Magistrale in Informatica Osvaldo Gervasi ogervasi@computer.org 2 Introduction Technologies

More information

Managing a local Galaxy Instance. Anushka Brownley / Adam Kraut BioTeam Inc.

Managing a local Galaxy Instance. Anushka Brownley / Adam Kraut BioTeam Inc. Managing a local Galaxy Instance Anushka Brownley / Adam Kraut BioTeam Inc. Agenda Who are we Why a local installation Local infrastructure Local installation Tips and Tricks SlipStream Appliance WHO ARE

More information

Certified Cloud Computing Professional VS-1067

Certified Cloud Computing Professional VS-1067 Certified Cloud Computing Professional VS-1067 Certified Cloud Computing Professional Certification Code VS-1067 Vskills Cloud Computing Professional assesses the candidate for a company s cloud computing

More information

How To Understand Cloud Computing

How To Understand Cloud Computing Overview of Cloud Computing (ENCS 691K Chapter 1) Roch Glitho, PhD Associate Professor and Canada Research Chair My URL - http://users.encs.concordia.ca/~glitho/ Overview of Cloud Computing Towards a definition

More information

Cloud Computing. Chapter 1 Introducing Cloud Computing

Cloud Computing. Chapter 1 Introducing Cloud Computing Cloud Computing Chapter 1 Introducing Cloud Computing Learning Objectives Understand the abstract nature of cloud computing. Describe evolutionary factors of computing that led to the cloud. Describe virtualization

More information

NASA's Strategy and Activities in Server Side Analytics

NASA's Strategy and Activities in Server Side Analytics NASA's Strategy and Activities in Server Side Analytics Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at the ESGF/UVCDAT Conference Lawrence Livermore National Laboratory

More information

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of

More information

Ubuntu Cloud. Kyle MacDonald Canonical. @KyleMacDonald

Ubuntu Cloud. Kyle MacDonald Canonical. @KyleMacDonald Ubuntu Cloud Kyle MacDonald Canonical @KyleMacDonald What is the Cloud? The Landscape IaaS - Amazon EC2, Rackspace Cloud PaaS - Force.com, CloudFoundry, OpenShift, Heroku SaaS: SalesForce, Zoho, Google

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

Big Data in Test and Evaluation by Udaya Ranawake (HPCMP PETTT/Engility Corporation)

Big Data in Test and Evaluation by Udaya Ranawake (HPCMP PETTT/Engility Corporation) Big Data in Test and Evaluation by Udaya Ranawake (HPCMP PETTT/Engility Corporation) Approved for Public Release. Distribution Unlimited. Data Intensive Applications in T&E Win-T at ATC Automotive Data

More information

Assignment # 1 (Cloud Computing Security)

Assignment # 1 (Cloud Computing Security) Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual

More information

Data Centric Computing Revisited

Data Centric Computing Revisited Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data

More information

High Availability Databases based on Oracle 10g RAC on Linux

High Availability Databases based on Oracle 10g RAC on Linux High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN, June 2006 Luca Canali, CERN IT Outline Goals Architecture of an HA DB Service Deployment at the CERN Physics Database

More information

A.Prof. Dr. Markus Hagenbuchner markus@uow.edu.au. CSCI319 A Brief Introduction to Cloud Computing. CSCI319 Page: 1

A.Prof. Dr. Markus Hagenbuchner markus@uow.edu.au. CSCI319 A Brief Introduction to Cloud Computing. CSCI319 Page: 1 A.Prof. Dr. Markus Hagenbuchner markus@uow.edu.au CSCI319 A Brief Introduction to Cloud Computing CSCI319 Page: 1 Content and Objectives 1. Introduce to cloud computing 2. Develop and understanding to

More information

Map Reduce / Hadoop / HDFS

Map Reduce / Hadoop / HDFS Chapter 3: Map Reduce / Hadoop / HDFS 97 Overview Outline Distributed File Systems (re-visited) Motivation Programming Model Example Applications Big Data in Apache Hadoop HDFS in Hadoop YARN 98 Overview

More information

Oracle Applications and Cloud Computing - Future Direction

Oracle Applications and Cloud Computing - Future Direction Oracle Applications and Cloud Computing - Future Direction February 26, 2010 03:00 PM 03:40 PM Presented By Subash Krishnaswamy skrishna@astcorporation.com Vijay Tirumalai vtirumalai@astcorporation.com

More information

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS data analysis José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS Cluster definition: A computer cluster is a group of linked computers, working

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Discovery 2015: Cloud Computing Workshop June 20-24, 2011 Berkeley, CA Introduction to Cloud Computing Keith R. Jackson Lawrence Berkeley National Lab What is it? NIST Definition Cloud computing is a model

More information

Computing in clouds: Where we come from, Where we are, What we can, Where we go

Computing in clouds: Where we come from, Where we are, What we can, Where we go Computing in clouds: Where we come from, Where we are, What we can, Where we go Luc Bougé ENS Cachan/Rennes, IRISA, INRIA Biogenouest With help from many colleagues: Gabriel Antoniu, Guillaume Pierre,

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

Cloud Computing Now and the Future Development of the IaaS

Cloud Computing Now and the Future Development of the IaaS 2010 Cloud Computing Now and the Future Development of the IaaS Quanta Computer Division: CCASD Title: Project Manager Name: Chad Lin Agenda: What is Cloud Computing? Public, Private and Hybrid Cloud.

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services

More information

Cloud Computing Summary and Preparation for Examination

Cloud Computing Summary and Preparation for Examination Basics of Cloud Computing Lecture 8 Cloud Computing Summary and Preparation for Examination Satish Srirama Outline Quick recap of what we have learnt as part of this course How to prepare for the examination

More information

Virtualization with Windows

Virtualization with Windows Virtualization with Windows at CERN Juraj Sucik, Emmanuel Ormancey Internet Services Group Agenda Current status of IT-IS group virtualization service Server Self Service New virtualization features in

More information

Storage solutions for a. infrastructure. Giacinto DONVITO INFN-Bari. Workshop on Cloud Services for File Synchronisation and Sharing

Storage solutions for a. infrastructure. Giacinto DONVITO INFN-Bari. Workshop on Cloud Services for File Synchronisation and Sharing Storage solutions for a productionlevel cloud infrastructure Giacinto DONVITO INFN-Bari Synchronisation and Sharing 1 Outline Use cases Technologies evaluated Implementation (hw and sw) Problems and optimization

More information

2) Xen Hypervisor 3) UEC

2) Xen Hypervisor 3) UEC 5. Implementation Implementation of the trust model requires first preparing a test bed. It is a cloud computing environment that is required as the first step towards the implementation. Various tools

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

How To Compare Cloud Computing To Cloud Platforms And Cloud Computing

How To Compare Cloud Computing To Cloud Platforms And Cloud Computing Volume 3, Issue 11, November 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Cloud Platforms

More information

DISTRIBUTED MINING ALGORITHM USING HADOOP ON LARGE DATA SET

DISTRIBUTED MINING ALGORITHM USING HADOOP ON LARGE DATA SET DISTRIBUTED MINING ALGORITHM USING HADOOP ON LARGE DATA SET Ms. E. Suganya PG Scholar, Computer Science and Engineering, Nandha College of Technology, Perundurai, Tamilnadu, India. Abstract Cloud computing,

More information

MapReduce and Hadoop Distributed File System V I J A Y R A O

MapReduce and Hadoop Distributed File System V I J A Y R A O MapReduce and Hadoop Distributed File System 1 V I J A Y R A O The Context: Big-data Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) Google collects 270PB data in a month (2007), 20000PB

More information

Virtual Machine Based Resource Allocation For Cloud Computing Environment

Virtual Machine Based Resource Allocation For Cloud Computing Environment Virtual Machine Based Resource Allocation For Cloud Computing Environment D.Udaya Sree M.Tech (CSE) Department Of CSE SVCET,Chittoor. Andra Pradesh, India Dr.J.Janet Head of Department Department of CSE

More information

High Performance Computing Cloud Computing. Dr. Rami YARED

High Performance Computing Cloud Computing. Dr. Rami YARED High Performance Computing Cloud Computing Dr. Rami YARED Outline High Performance Computing Parallel Computing Cloud Computing Definitions Advantages and drawbacks Cloud Computing vs Grid Computing Outline

More information

Viswanath Nandigam Sriram Krishnan Chaitan Baru

Viswanath Nandigam Sriram Krishnan Chaitan Baru Viswanath Nandigam Sriram Krishnan Chaitan Baru Traditional Database Implementations for large-scale spatial data Data Partitioning Spatial Extensions Pros and Cons Cloud Computing Introduction Relevance

More information

Cornell University Center for Advanced Computing

Cornell University Center for Advanced Computing Cornell University Center for Advanced Computing David A. Lifka - lifka@cac.cornell.edu Director - Cornell University Center for Advanced Computing (CAC) Director Research Computing - Weill Cornell Medical

More information

University of Messina, Italy

University of Messina, Italy University of Messina, Italy IEEE MoCS 2011 Kerkyra - Greece June 28, 2011 Dr. Massimo Villari mvillari@unime.it Cross Cloud Federation Federated Cloud Scenario Cloud Middleware Model: the Stack The CLEVER

More information

Behind the scene III Cloud computing

Behind the scene III Cloud computing Behind the scene III Cloud computing Athens, 15.11.2014 M. Dolenc / R. Klinc Why we do it? Engineering in the cloud is a combina3on of cloud based services and rich interac3ve applica3ons allowing engineers

More information

Introduction to Big Data Training

Introduction to Big Data Training Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB

More information

Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo

Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Sensor Network Messaging Service Hive/Hadoop Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Contents 1 Introduction 2 What & Why Sensor Network

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Cloud 101. Mike Gangl, Caltech/JPL, michael.e.gangl@jpl.nasa.gov 2015 California Institute of Technology. Government sponsorship acknowledged

Cloud 101. Mike Gangl, Caltech/JPL, michael.e.gangl@jpl.nasa.gov 2015 California Institute of Technology. Government sponsorship acknowledged Cloud 101 Mike Gangl, Caltech/JPL, michael.e.gangl@jpl.nasa.gov 2015 California Institute of Technology. Government sponsorship acknowledged Outline What is cloud computing? Cloud service models Deployment

More information

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy Derrick Kondo INRIA, France Outline Cloud Grid Volunteer Computing Cloud Background Vision Hide complexity of hardware

More information

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics BIG DATA WITH HADOOP EXPERIMENTATION HARRISON CARRANZA Marist College APARICIO CARRANZA NYC College of Technology CUNY ECC Conference 2016 Poughkeepsie, NY, June 12-14, 2016 Marist College AGENDA Contents

More information

Cloud Courses Description

Cloud Courses Description Courses Description 101: Fundamental Computing and Architecture Computing Concepts and Models. Data center architecture. Fundamental Architecture. Virtualization Basics. platforms: IaaS, PaaS, SaaS. deployment

More information

Cloud Storage and Backup

Cloud Storage and Backup Cloud Storage and Backup Cloud Storage and Backup Cloud Storage and Backup services from iomartcloud have been designed to deliver the performance, capacity, security and flexibility needed to address

More information