The Murchison Widefield Array Data Archive System. Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia

Size: px
Start display at page:

Download "The Murchison Widefield Array Data Archive System. Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia"

Transcription

1 The Murchison Widefield Array Data Archive System Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia

2 Agenda Dataflow Requirements Solutions & Lessons learnt Open solution Data ingestion Data distribution Data storage & access Data processing 1

3 Quick Overview On the ground Main Goals: Detection of hydrogen signature from EOR, transient detection, all sky survey, Solar MWA Overview Low frequency interferometer MHz (30 deg FOV) Science goals Detection of hydrogen, Signature from EOR, Transient detection, All sky survey, Solar observations 2048 dipoles antennas Low frequency interferometer MHz 2048 dipoles 16 elements per tile to make 128 tiles (8128 baselines) Virtual beam steering 16 elements per tile to make 128 tiles ( baselines) Longest baseline 3km Dense core short baselines Virtual beam steering Operation started in July 2013 Longest baseline 3km Metal spiders 1 2

4 Tier 0 (MRO) Online processing Online archive 400 MB/s NGAS Client Receiver Correlator PFB Tile Beam former Data Capture Tier 2 Tier 1 Science Archive 10Gbps 1Gbps 10Gbps Mirrored Archive AARNet MIT, USA VUW, New Zealand RRI, India MS Backup 1.44 TB/night Images 18 GB/night QA Raw Visibilities Catalog Catalog NGAS Client Long-term archive ( 3 PB / year) RV MS Measurement Set Raw RV Pawsey, Perth (Disk + Tape) VO local storage Visibilities 7 TB / node RTS images Web CASA pipeline Sky model RTS pipeline Fornax Processing (96 nodes) Scripting UI Image (cutout) & Image cubes 3

5 High-level Requirements of MWA Archive High throughput data ingestion ~400MB/s visibility, MB/s vis + image cube Efficient data distribution to multiple locations Australia / New Zealand / USA / India Secure and cost-effective storage of 8 10 TB of data collected daily Fast access to science archive by astronomers from 3 continents Intensive re-processing of archived data on GPU clusters calibration, imaging, etc. Continuous growth data volume (3PB/year), data variety (visibility, image cube, catalogue FITS, CASA, HDF5, JPEG2000) environment (Cortex à Pawsey) 4

6 Open Solution NGAS Openness Open source (L-GPL) originated from ESO for data archiving by Knudstrup and Wicenec (2000) Used by the astronomy community VLT, & La Silla, ALMA, evla, MWA Plugin-based - almost every feature is implemented as a plugin plugged into a light kernel Loss-free operation consistency checking, replication, fault-tolerance (e.g. power outage, disk failure, etc.) Object-based data storage Span multiple file systems distributed across multiple sites (e.g. VLT > 100 million objects) native HTTP interface (POST, GET, PUT, etc.) and location-transparent (Web admin UI) Scalable architecture Horizontal scalability add another NGAS instance Low cost, hardware-neutral deployment, Linux machine + Disk arrays Support mobile storage media with energy efficiency Ingestion, Staging, Long-term Archive, Temporary storage, Proxy, Processing, etc. Versioning 5

7 Open Solution NGAS The code base of the NGAS core software contains about 40,000 lines of Python code For the MWA Archive, we have made some optimisations (e.g.) Multiple stream concurrent data transfer across the Pacific ocean (> 60MB/s) New plugins (Proxy Archive) flexible data path routing New plugins for interacting with the Hierarchical Storage Management systems A new running mode NGAS Data mover A new in-archive processing framework (similar to MapReduce) 4,000 lines of C++ code to integrate the NGAS Client into the DataCapture system Openness allows us to Fully understand what is precisely happening on each section of the data path Make necessary changes / optimizations when necessary Make changes quickly in an agile fashion for verification and test 6

8 Data Ingestion Involves: Client push (synchronous) or Server pull (asynchronous) Find data type-specific storage medium with available capacity Receive data stream, and compute checksum (CRC) on the fly Data is stored temporarily at the Staging Area Data Archive Plug-In is invoked Quality check / compression Move file to the targeted volume / directory (e.g. MWA-DAPI) Register file in NGAS DB If replication defined, trigger file delivery threads file removal scheduling 7

9 Data Ingestion simulation simulators, each produces data at a rate of 16 MB/s to 2 NGAS servers running on 1 Supermicro node, and each NGAS server ingests data from 6 simulators Archiving simulation throughput 1200 Total archive throughput - MB/s A - 1Gbps bandwidth for commissioning B - 12 clients / 4 servers on 6 / 2 Fornax nodes C - aggregated data producing rate for 12 clients D - 12 clients / 4 servers on 6 idataplex / 1 Supermicro E - 12 clients / 2 servers on 6 idataplex / 1 Supermicro F - 24 clients / 4 servers on 6 / 1 Fornax nodes G - 24 clients / 4 servers on 24 / 2 Fornax nodes H - 24 clients / 4 servers on 12 / 2 Fornax nodes I - 24 clients / 4 servers on 28 Fornax nodes J - aggregated data producing rate for 24 clients Simulated data rate per client - MB/s 8

10 Data Ingestion field test field test Data ingestion at the MRO Online Archive. Each one of the five parallel streams records an ingestion rate over 75 MB/s. The aggregated ingestion rate amounts to 382 MB/s 9

11 Efficient Data Distribution Subscription-based data distribution Which subscriber has what data and since when? We are using HTTP to transfer 5TB per day from Western Australia to Boston, USA. (> 50 MB /s) HTTP is only an application protocol but not a transport / network protocol 10

12 Multi-tiered data storage More than 500TB since 23 Aug 2013 M&C DB Observation Portal MWA Pawsey Hierarchical Storage Management (DMF) Tape Library x32 x32 Tape Library x2 (CSIRO - library + cables + op9cs) DM-PUT DM-GET Stage Fast Disk Storage Bulk Disk Storage CXFS Release Stage Archive API POSIX FE Nodes Science DB 10 Gbps VO (TAP, SIAP) 11

13 Project A - # of accesses Project A - # of accesses Project B - # of accesses Project B - # of accesses Overall Interaccess Overall Inter-access Project C Interaccess Project C Inter-access 12

14 Data processing on Fornax +,-.(/#012,-0,3#-(45-#52-6 Fornax Cluster Long-term archive (045 7"82$(9":-+ ()*+,$ EHF G,1-#$-, G&,-#$%> /),+;)6 ;"<*(9":-+ ()*+,# +)$ +)# JobMAN ;".<3,-(9":-+!""$!""#!"%& Process!%"& RAM #$$# &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +*! "!$""! " Distribute #$$#!"%% Local storage 7 TB / node CACHE &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +*!"%- Data movement dominates the processing cycle!"%' =-5->"<.-$,(9":-+!"%. Disks + Tape!%"& 73+,#-(9":-+ NGAS Short Term Storage Data Processing Pipeline!)11#!)112!)113!/01#!)11&!)11'!)11.!)11- Disk Cache Stage! "!)11$ &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +*!/01$ #$$#!%"& &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +*! " #$$#!%"& &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +*! " #$$#!%"& F(G,1-#$-, (+:9,19 Tape Libraries! 1 13

15 Local file system Global file systems 14

16 Data re-processing Web UI 15

17 12 April 2013: First image from the full MWA (105 tiles) Credit: MWA Science Commissioning Team 150 MHz (8 minutes of data) over some 35 degrees of the sky 16

18 Thank you for your attention 17

MWA Archive A multi-tiered dataflow and storage system based on NGAS

MWA Archive A multi-tiered dataflow and storage system based on NGAS MWA Archive A multi-tiered dataflow and storage system based on NGAS Chen Wu Team: Dave Pallot, Andreas Wicenec, Chen Wu Agenda Dataflow overview High-level requirements Next Generation Archive System

More information

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS)

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) Jessica Chapman, Data Workshop March 2013 ASKAP Science Data Archive Talk outline Data flow in brief Some radio

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server

More information

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data

More information

Diagram 1: Islands of storage across a digital broadcast workflow

Diagram 1: Islands of storage across a digital broadcast workflow XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,

More information

StorReduce Technical White Paper Cloud-based Data Deduplication

StorReduce Technical White Paper Cloud-based Data Deduplication StorReduce Technical White Paper Cloud-based Data Deduplication See also at storreduce.com/docs StorReduce Quick Start Guide StorReduce FAQ StorReduce Solution Brief, and StorReduce Blog at storreduce.com/blog

More information

Optimising NGAS for the MWA Archive

Optimising NGAS for the MWA Archive Noname manuscript No. (will be inserted by the editor) Optimising NGAS for the MWA Archive C Wu A Wicenec D Pallot A Checcucci Received: date / Accepted: date Abstract The Murchison Widefield Array (MWA)

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Sean Lee Solution Architect, SDI, IBM Systems SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Agenda Converging Technology Forces New Generation Applications Data Management Challenges

More information

www.thinkparq.com www.beegfs.com

www.thinkparq.com www.beegfs.com www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a

More information

THE HADOOP DISTRIBUTED FILE SYSTEM

THE HADOOP DISTRIBUTED FILE SYSTEM THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,

More information

Hitachi NAS Platform and Hitachi Content Platform with ESRI Image

Hitachi NAS Platform and Hitachi Content Platform with ESRI Image W H I T E P A P E R Hitachi NAS Platform and Hitachi Content Platform with ESRI Image Aciduisismodo Extension to ArcGIS Dolore Server Eolore for Dionseq Geographic Uatummy Information Odolorem Systems

More information

Bigdata High Availability (HA) Architecture

Bigdata High Availability (HA) Architecture Bigdata High Availability (HA) Architecture Introduction This whitepaper describes an HA architecture based on a shared nothing design. Each node uses commodity hardware and has its own local resources

More information

High Availability Databases based on Oracle 10g RAC on Linux

High Availability Databases based on Oracle 10g RAC on Linux High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN, June 2006 Luca Canali, CERN IT Outline Goals Architecture of an HA DB Service Deployment at the CERN Physics Database

More information

Enterprise Architectures for Large Tiled Basemap Projects. Tommy Fauvell

Enterprise Architectures for Large Tiled Basemap Projects. Tommy Fauvell Enterprise Architectures for Large Tiled Basemap Projects Tommy Fauvell Tommy Fauvell Senior Technical Analyst Esri Professional Services Washington D.C Regional Office Project Technical Lead: - Responsible

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A

More information

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti

More information

How To Design A Data Center

How To Design A Data Center Data Center Design & Virtualization Md. Jahangir Hossain Open Communication Limited jahangir@open.com.bd Objectives Data Center Architecture Data Center Standard Data Center Design Model Application Design

More information

A New Data Visualization and Analysis Tool

A New Data Visualization and Analysis Tool Title: A New Data Visualization and Analysis Tool Author: Kern Date: 22 February 2013 NRAO Doc. #: Version: 1.0 A New Data Visualization and Analysis Tool PREPARED BY ORGANIZATION DATE Jeff Kern NRAO 22

More information

Using Linux Clusters as VoD Servers

Using Linux Clusters as VoD Servers HAC LUCE Using Linux Clusters as VoD Servers Víctor M. Guĺıas Fernández gulias@lfcia.org Computer Science Department University of A Corunha funded by: Outline Background: The Borg Cluster Video on Demand.

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Tier 2 Nearline. As archives grow, Echo grows. Dynamically, cost-effectively and massively. What is nearline? Transfer to Tape

Tier 2 Nearline. As archives grow, Echo grows. Dynamically, cost-effectively and massively. What is nearline? Transfer to Tape Tier 2 Nearline As archives grow, Echo grows. Dynamically, cost-effectively and massively. Large Scale Storage Built for Media GB Labs Echo nearline systems have the scale and performance to allow users

More information

The Design and Implementation of the Zetta Storage Service. October 27, 2009

The Design and Implementation of the Zetta Storage Service. October 27, 2009 The Design and Implementation of the Zetta Storage Service October 27, 2009 Zetta s Mission Simplify Enterprise Storage Zetta delivers enterprise-grade storage as a service for IT professionals needing

More information

Introduction to LSST Data Management. Jeffrey Kantor Data Management Project Manager

Introduction to LSST Data Management. Jeffrey Kantor Data Management Project Manager Introduction to LSST Data Management Jeffrey Kantor Data Management Project Manager LSST Data Management Principal Responsibilities Archive Raw Data: Receive the incoming stream of images that the Camera

More information

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything BlueArc unified network storage systems 7th TF-Storage Meeting Scale Bigger, Store Smarter, Accelerate Everything BlueArc s Heritage Private Company, founded in 1998 Headquarters in San Jose, CA Highest

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

Analisi di un servizio SRM: StoRM

Analisi di un servizio SRM: StoRM 27 November 2007 General Parallel File System (GPFS) The StoRM service Deployment configuration Authorization and ACLs Conclusions. Definition of terms Definition of terms 1/2 Distributed File System The

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM ESSENTIALS HIGH-SPEED, SCALABLE DEDUPLICATION Up to 58.7 TB/hr performance Reduces protection storage requirements by 10 to 30x CPU-centric scalability DATA INVULNERABILITY ARCHITECTURE Inline write/read

More information

Long term retention and archiving the challenges and the solution

Long term retention and archiving the challenges and the solution Long term retention and archiving the challenges and the solution NAME: Yoel Ben-Ari TITLE: VP Business Development, GH Israel 1 Archive Before Backup EMC recommended practice 2 1 Backup/recovery process

More information

Symantec NetBackup Appliances

Symantec NetBackup Appliances Symantec NetBackup Appliances Simplifying Backup Operations Geoff Greenlaw Manager, Data Centre Appliances UK & Ireland January 2012 1 Simplifying Your Backups Reduce Costs Minimise Complexity Deliver

More information

Cloud Based Application Architectures using Smart Computing

Cloud Based Application Architectures using Smart Computing Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products

More information

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved. THE EMC ISILON STORY Big Data In The Enterprise 2012 1 Big Data In The Enterprise Isilon Overview Isilon Technology Summary 2 What is Big Data? 3 The Big Data Challenge File Shares 90 and Archives 80 Bioinformatics

More information

Distributed Database Access in the LHC Computing Grid with CORAL

Distributed Database Access in the LHC Computing Grid with CORAL Distributed Database Access in the LHC Computing Grid with CORAL Dirk Duellmann, CERN IT on behalf of the CORAL team (R. Chytracek, D. Duellmann, G. Govi, I. Papadopoulos, Z. Xie) http://pool.cern.ch &

More information

Using Linux Clusters as VoD Servers

Using Linux Clusters as VoD Servers HAC LUCE Using Linux Clusters as VoD Servers Víctor M. Guĺıas Fernández gulias@lfcia.org Computer Science Department University of A Corunha funded by: Outline Background: The Borg Cluster Video on Demand.

More information

Backup Implementation Proposal

Backup Implementation Proposal Backup Implementation Proposal Document Revision:. 5//6 (wcw) 6 Carnegie Mellon University. All Rights Reserved 1 Ideal Multi-tier backup architecture Tier 1 Tier Tier Backup Server for client machines

More information

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Design and Implementation of a Storage Repository Using Commonality Factoring IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Axion Overview Potentially infinite historic versioning for rollback and

More information

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data David Minor 1, Reagan Moore 2, Bing Zhu, Charles Cowart 4 1. (88)4-104 minor@sdsc.edu San Diego Supercomputer Center

More information

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File

More information

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations Technical Product Management Team Endpoint Security Copyright 2007 All Rights Reserved Revision 6 Introduction This

More information

Archiving and Managing Remote Sensing Data Using State of the Art Storage Technologies

Archiving and Managing Remote Sensing Data Using State of the Art Storage Technologies Archiving and Managing Remote Sensing Data Using State of the Art Storage Technologies Ms B Lakshmi C Chandrasekhar Reddy SVSRK Kishore SDAPSA, NRSC, Hyderabad NRSC Functions Remote Sensing Data Acquisition

More information

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand Hari Subramoni *, Ping Lai *, Raj Kettimuthu **, Dhabaleswar. K. (DK) Panda * * Computer Science and Engineering Department

More information

Eloquence Training What s new in Eloquence B.08.00

Eloquence Training What s new in Eloquence B.08.00 Eloquence Training What s new in Eloquence B.08.00 2010 Marxmeier Software AG Rev:100727 Overview Released December 2008 Supported until November 2013 Supports 32-bit and 64-bit platforms HP-UX Itanium

More information

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack May 2015 Copyright 2015 SwiftStack, Inc. swiftstack.com Page 1 of 19 Table of Contents INTRODUCTION... 3 OpenStack

More information

ARCHIVING AND MANAGING REMOTE SENSING DATA USING STATE OF THE ART STORAGE TECHNOLOGIES

ARCHIVING AND MANAGING REMOTE SENSING DATA USING STATE OF THE ART STORAGE TECHNOLOGIES ARCHIVING AND MANAGING REMOTE SENSING DATA USING STATE OF THE ART STORAGE TECHNOLOGIES B Lakshmi a, C Chandrasekhara Reddy a, *, SVSRK Kishore a a NRSC, SDAPSA, Hyderabad, India (lakshmi_b,sekharareddy_cc,kishore_svsr)@nrsc.gov.in

More information

Object storage in Cloud Computing and Embedded Processing

Object storage in Cloud Computing and Embedded Processing Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer DDN We Accelerate Information Insight DDN is a Leader in Massively Scalable Platforms and Solutions for Big Data

More information

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows Sponsored by: Prepared by: Eric Slack, Sr. Analyst May 2012 Storage Infrastructures for Big Data Workflows Introduction Big

More information

Networking in the Hadoop Cluster

Networking in the Hadoop Cluster Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop

More information

Building a large scale CDN with Apache Trafficserver. Jan van Doorn jan_vandoorn@cable.comcast.com

Building a large scale CDN with Apache Trafficserver. Jan van Doorn jan_vandoorn@cable.comcast.com Building a large scale CDN with Apache Trafficserver Jan van Doorn jan_vandoorn@cable.comcast.com About me Engineer at Comcast Cable NaBonal Engineering & Technical OperaBons NETO- VSS- CDNENG Tech Lead

More information

Building Storage Service in a Private Cloud

Building Storage Service in a Private Cloud Building Storage Service in a Private Cloud Sateesh Potturu & Deepak Vasudevan Wipro Technologies Abstract Storage in a private cloud is the storage that sits within a particular enterprise security domain

More information

( ) ( ) TECHNOLOGY BRIEF. XTNDConnect Server: Scalability SCALABILITY REFERS TO HOW WELL THE SYSTEM ADAPTS TO INCREASED DEMANDS AND A GREATER

( ) ( ) TECHNOLOGY BRIEF. XTNDConnect Server: Scalability SCALABILITY REFERS TO HOW WELL THE SYSTEM ADAPTS TO INCREASED DEMANDS AND A GREATER TECHNOLOGY BRIEF XTNDConnect Server: Scalability An important consideration for IT professionals when choosing a server-based synchronization solution is that of scalability. Scalability refers to how

More information

Turbo Charge Your Data Protection Strategy

Turbo Charge Your Data Protection Strategy Turbo Charge Your Data Protection Strategy Data protection for the hybrid cloud 1 WAVES OF CHANGE! Data GROWTH User EXPECTATIONS Do It YOURSELF Can t Keep Up Reliability and Visibility New Choices and

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM EMC DATA DOMAIN OPERATING SYSTEM Powering EMC Protection Storage ESSENTIALS High-Speed, Scalable Deduplication Up to 58.7 TB/hr performance Reduces requirements for backup storage by 10 to 30x and archive

More information

PARALLELS CLOUD STORAGE

PARALLELS CLOUD STORAGE PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...

More information

Red Hat Storage Server

Red Hat Storage Server Red Hat Storage Server Marcel Hergaarden Solution Architect, Red Hat marcel.hergaarden@redhat.com May 23, 2013 Unstoppable, OpenSource Software-based Storage Solution The Foundation for the Modern Hybrid

More information

HA / DR Jargon Buster High Availability / Disaster Recovery

HA / DR Jargon Buster High Availability / Disaster Recovery HA / DR Jargon Buster High Availability / Disaster Recovery Welcome to Maxava s Jargon Buster. Your quick reference guide to Maxava HA and industry technical terms related to High Availability and Disaster

More information

Large File System Backup NERSC Global File System Experience

Large File System Backup NERSC Global File System Experience Large File System Backup NERSC Global File System Experience M. Andrews, J. Hick, W. Kramer, A. Mokhtarani National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory

More information

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy OVERVIEW The global communication and the continuous growth of services provided through the Internet or local infrastructure require to

More information

EMC BACKUP MEETS BIG DATA

EMC BACKUP MEETS BIG DATA EMC BACKUP MEETS BIG DATA Strategies To Protect Greenplum, Isilon And Teradata Systems 1 Agenda Big Data: Overview, Backup and Recovery EMC Big Data Backup Strategy EMC Backup and Recovery Solutions for

More information

The safer, easier way to help you pass any IT exams. Exam : 000-115. Storage Sales V2. Title : Version : Demo 1 / 5

The safer, easier way to help you pass any IT exams. Exam : 000-115. Storage Sales V2. Title : Version : Demo 1 / 5 Exam : 000-115 Title : Storage Sales V2 Version : Demo 1 / 5 1.The IBM TS7680 ProtecTIER Deduplication Gateway for System z solution is designed to provide all of the following EXCEPT: A. ESCON attach

More information

Redefining Microsoft SQL Server Data Management. PAS Specification

Redefining Microsoft SQL Server Data Management. PAS Specification Redefining Microsoft SQL Server Data Management APRIL Actifio 11, 2013 PAS Specification Table of Contents Introduction.... 3 Background.... 3 Virtualizing Microsoft SQL Server Data Management.... 4 Virtualizing

More information

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads A Competitive Test and Evaluation Report

More information

POSIX and Object Distributed Storage Systems

POSIX and Object Distributed Storage Systems 1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome

More information

Taking Big Data to the Cloud. Enabling cloud computing & storage for big data applications with on-demand, high-speed transport WHITE PAPER

Taking Big Data to the Cloud. Enabling cloud computing & storage for big data applications with on-demand, high-speed transport WHITE PAPER Taking Big Data to the Cloud WHITE PAPER TABLE OF CONTENTS Introduction 2 The Cloud Promise 3 The Big Data Challenge 3 Aspera Solution 4 Delivering on the Promise 4 HIGHLIGHTS Challenges Transporting large

More information

Content Distribution Management

Content Distribution Management Digitizing the Olympics was truly one of the most ambitious media projects in history, and we could not have done it without Signiant. We used Signiant CDM to automate 54 different workflows between 11

More information

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects

More information

New Design and Layout Tips For Processing Multiple Tasks

New Design and Layout Tips For Processing Multiple Tasks Novel, Highly-Parallel Software for the Online Storage System of the ATLAS Experiment at CERN: Design and Performances Tommaso Colombo a,b Wainer Vandelli b a Università degli Studi di Pavia b CERN IEEE

More information

How To Write A Trusted Analytics Platform (Tap)

How To Write A Trusted Analytics Platform (Tap) Trusted Analytics Platform (TAP) TAP Technical Brief October 2015 TAP Technical Brief Overview Trusted Analytics Platform (TAP) is open source software, optimized for performance and security, that accelerates

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

NephOS A Licensed End-to-end IaaS Cloud Software Stack for Enterprise or OEM On-premise Use.

NephOS A Licensed End-to-end IaaS Cloud Software Stack for Enterprise or OEM On-premise Use. NephOS A Licensed End-to-end IaaS Cloud Software Stack for Enterprise or OEM On-premise Use. Benefits High performance architecture Advanced security and reliability Increased operational efficiency More

More information

Versity 2013. All rights reserved.

Versity 2013. All rights reserved. From the only independent developer of large scale archival storage systems, the Versity Storage Manager brings enterpriseclass storage virtualization to the Linux platform. Based on Open Source technology,

More information

scalability OneBridge

scalability OneBridge scalability OneBridge Mobile Groupware technical brief An important consideration for IT professionals when choosing a server-based synchronization solution is that of scalability. Scalability refers to

More information

M710 - Max 960 Drive, 8Gb/16Gb FC, Max 48 ports, Max 192GB Cache Memory

M710 - Max 960 Drive, 8Gb/16Gb FC, Max 48 ports, Max 192GB Cache Memory SFD6 NEC *Gideon Senderov NEC $1.4B/yr in R & D Over 55 years in servers and storage (1958) SDN, Servers, Storage, Software M-Series and HYDRAstor *Chauncey Schwartz MX10-Series New models are M110, M310,

More information

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University

More information

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007 Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms Cray User Group Meeting June 2007 Cray s Storage Strategy Background Broad range of HPC requirements

More information

AFS Usage and Backups using TiBS at Fermilab. Presented by Kevin Hill

AFS Usage and Backups using TiBS at Fermilab. Presented by Kevin Hill AFS Usage and Backups using TiBS at Fermilab Presented by Kevin Hill Agenda History and current usage of AFS at Fermilab About Teradactyl How TiBS (True Incremental Backup System) and TeraMerge works AFS

More information

Protecting Big Data Data Protection Solutions for the Business Data Lake

Protecting Big Data Data Protection Solutions for the Business Data Lake White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With

More information

Next Generation Operating Systems

Next Generation Operating Systems Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the

More information

Shoal: IaaS Cloud Cache Publisher

Shoal: IaaS Cloud Cache Publisher University of Victoria Faculty of Engineering Winter 2013 Work Term Report Shoal: IaaS Cloud Cache Publisher Department of Physics University of Victoria Victoria, BC Mike Chester V00711672 Work Term 3

More information

UW-IT Backups & Archives

UW-IT Backups & Archives UW-IT Backups & Archives Powerful, Flexible, Affordable UW-IT TechTalk February 19, 2015 Agenda Definitions Yesterday Today Tomorrow Your thoughts Backups Defined Data is hot Primary data copy is on first-tier

More information

Enterprise GIS Architecture Deployment Options. Andrew Sakowicz

Enterprise GIS Architecture Deployment Options. Andrew Sakowicz Enterprise GIS Architecture Deployment Options Andrew Sakowicz Audience Audience - Architects - Developers - Administrators - Project Managers Level: - Beginner / Intermediate Introduction Andrew Sakowicz

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

Oracle Database 10g: Backup and Recovery 1-2

Oracle Database 10g: Backup and Recovery 1-2 Oracle Database 10g: Backup and Recovery 1-2 Oracle Database 10g: Backup and Recovery 1-3 What Is Backup and Recovery? The phrase backup and recovery refers to the strategies and techniques that are employed

More information

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud An Oracle White Paper July 2011 Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud Executive Summary... 3 Introduction... 4 Hardware and Software Overview... 5 Compute Node... 5 Storage

More information

Alternative models to distribute VO specific software to WLCG sites: a prototype set up at PIC

Alternative models to distribute VO specific software to WLCG sites: a prototype set up at PIC EGEE and glite are registered trademarks Enabling Grids for E-sciencE Alternative models to distribute VO specific software to WLCG sites: a prototype set up at PIC Elisa Lanciotti, Arnau Bria, Gonzalo

More information

BaBar and ROOT data storage. Peter Elmer BaBar Princeton University ROOT2002 14 Oct. 2002

BaBar and ROOT data storage. Peter Elmer BaBar Princeton University ROOT2002 14 Oct. 2002 BaBar and ROOT data storage Peter Elmer BaBar Princeton University ROOT2002 14 Oct. 2002 The BaBar experiment BaBar is an experiment built primarily to study B physics at an asymmetric high luminosity

More information

EMC arhiviranje. Lilijana Pelko Primož Golob. Sarajevo, 16.10.2008. Copyright 2008 EMC Corporation. All rights reserved.

EMC arhiviranje. Lilijana Pelko Primož Golob. Sarajevo, 16.10.2008. Copyright 2008 EMC Corporation. All rights reserved. EMC arhiviranje Lilijana Pelko Primož Golob Sarajevo, 16.10.2008 1 Agenda EMC Today Reasons to archive EMC Centera EMC EmailXtender EMC DiskXtender Use cases 2 EMC Strategic Acquisitions: Strengthen and

More information

Scalable Multi-Node Event Logging System for Ba Bar

Scalable Multi-Node Event Logging System for Ba Bar A New Scalable Multi-Node Event Logging System for BaBar James A. Hamilton Steffen Luitz For the BaBar Computing Group Original Structure Raw Data Processing Level 3 Trigger Mirror Detector Electronics

More information

Utilizing the SDSC Cloud Storage Service

Utilizing the SDSC Cloud Storage Service Utilizing the SDSC Cloud Storage Service PASIG Conference January 13, 2012 Richard L. Moore rlm@sdsc.edu San Diego Supercomputer Center University of California San Diego Traditional supercomputer center

More information

ntier Verde Simply Affordable File Storage

ntier Verde Simply Affordable File Storage ntier Verde Simply Affordable File Storage Current Market Problems Data Growth Continues Data Retention Increases By 2020 the Digital Universe will hold 40 Zettabytes The Market is Missing: An easy to

More information

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Archiving On-Premise and in the Cloud. March 2015

Archiving On-Premise and in the Cloud. March 2015 Archiving On-Premise and in the Cloud March 2015 Cloud Storage Storage accessed over a network via web services APIs. http://swift.example.com/v1/account/container/object Source: http://docs.openstack.org/admin-guide-cloud/content/objectstorage_characteristics.html

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Business Continuity with the. Concerto 7000 All Flash Array. Layers of Protection for Here, Near and Anywhere Data Availability

Business Continuity with the. Concerto 7000 All Flash Array. Layers of Protection for Here, Near and Anywhere Data Availability Business Continuity with the Concerto 7000 All Flash Array Layers of Protection for Here, Near and Anywhere Data Availability Version 1.0 Abstract Concerto 7000 All Flash Array s Continuous Data Protection

More information

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed Computing Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies

More information

Data Mining with Hadoop at TACC

Data Mining with Hadoop at TACC Data Mining with Hadoop at TACC Weijia Xu Data Mining & Statistics Data Mining & Statistics Group Main activities Research and Development Developing new data mining and analysis solutions for practical

More information

Use of Hadoop File System for Nuclear Physics Analyses in STAR

Use of Hadoop File System for Nuclear Physics Analyses in STAR 1 Use of Hadoop File System for Nuclear Physics Analyses in STAR EVAN SANGALINE UC DAVIS Motivations 2 Data storage a key component of analysis requirements Transmission and storage across diverse resources

More information

Lessons learned from parallel file system operation

Lessons learned from parallel file system operation Lessons learned from parallel file system operation Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association

More information