MWA Archive A multi-tiered dataflow and storage system based on NGAS

Size: px
Start display at page:

Download "MWA Archive A multi-tiered dataflow and storage system based on NGAS"

Transcription

1 MWA Archive A multi-tiered dataflow and storage system based on NGAS Chen Wu Team: Dave Pallot, Andreas Wicenec, Chen Wu

2 Agenda Dataflow overview High-level requirements Next Generation Archive System (NGAS) Additional development to tailor NGAS for MWA Meet requirements Data capturing Data ingestion Data storage Data access Data mirroring Work-in-progress Data re-processing Conclusion 1

3 Online archive 32T Dataflow Online processing 320Gb/s 1/10Gbps ~ 400 MB/s 1 Gbps Staging archive MRO Mirrored archive AARNet ICRAR, Perth MIT, USA 10Gbps 10Gbps VO Long-term archive Staging & processing Data distribution, placement, scheduling Web 10Gbps Research PBStore, Perth Fornax, Perth Scripting UI Catalog VO-Table VO-Table 2

4 Online archive 128T Dataflow Online processing 320Gb/s MRO ~ 400 MB/s 1 Gbps 1/10 Gbps Proxy Archive 1Gbps Mirrored archive AARNet ICRAR, Perth MIT, USA VO Long-term archive Staging & processing Data distribution, placement, scheduling Web 10Gbps Research PBStore, Perth Fornax, Perth Scripting UI Catalog VO-Table VO-Table 3

5 Conceptual Dataflow Tier 0 metadata Online Archive RF data stream Online Processing control monitor Further Further processing Processing Further Further processing Processing Further Further processing Processing Tier 2 Proxy Archive Tier 1 Mirrored Mirrored Archive Archive Mirrored Mirrored Archive Archive Mirrored Mirrored Archive Archive Long-Term Archive Offline Processing 4

6 High-level Requirements High throughput data ingestion ~400MB/s visibility, MB/s vis + image cube Efficient data distribution to multiple locations Australia / New Zealand / USA / India Secure and cost-effective storage of 8 10 TB of data collected daily Fast access to science archive by astronomers from 3 continents Intensive re-processing of archived data on GPU clusters calibration, imaging, etc. Continuous growth data volume (3PB/year), data variety (visibility, image cube, catalogue FITS, CASA, HDF5, JPEG2000) environment (Cortex à Pawsey) 5

7 Next Generation Archive System NGAS Openness Open source (L-GPL) originated from ESO for data archiving by Knudstrup and Wicenec (2000) Used by the astronomy community VLT, & La Silla, ALMA, evla, MWA ALMA: All 5 ALMA sites are based on NGAS, including all data transfers, mirroring from the observatory to all other sites across the globe. Full high availability setup with automatic fail-over Plugin-based - almost every feature is implemented as a plugin plugged into a light kernel Loss-free operation consistency checking, replication, fault-tolerance (e.g. power outage, disk failure, etc.) Object-based data storage Span multiple file systems distributed across multiple sites (e.g. VLT > 100 million objects) native HTTP interface (POST, GET, PUT, etc.) and location-transparent (Web admin UI) Scalable architecture Horizontal scalability add another NGAS instance In-storage processing Compute inside the archive Low cost, hardware-neutral deployment, versatility Linux machine + Disk arrays Support mobile storage media with energy efficiency Ingestion, Staging, Long-term Archive, Temporary storage, Proxy, Processing, etc. 6

8 Additional efforts needed for MWA High-throughput data ingestion 66 MB/s (ALMA) vs. 400 MB/s (MWA) Completely integrated systems vs. Theoretical performance Efficient dataflow control Multiple mirrored archives each with distinct subscription rules Saturate WAN bandwidth In-transit data processing Multi-tiered data storage Work well with HSM, e.g. retrieve a file from Tape à HTTP GET timeout Content-based access pattern classification for placement optimisation Workflow-optimised data staging to avoid I/O contention Staging data from long-term archive to Fornax (time and placement) Tasks in a complex workflow exchange data through a shared file system (But can we exploit local storage as well?) 7

9 Data MRO Wide Area Network NG/AMS Remote NGAS Servers Servers Web / Query / VO Interface Archive DB Metadata Network M&C Subsystem Local NG/AMS NGAS Servers Servers Data Network Data Producer DataCaptureMgr M&C Interface A Single Process Plugin-based Multi-threaded Throughput-oriented Fault-tolerant Admission control File handling In-memory Buffer DataHandler DataHandler Staging Area NGAS Client HDD Ramdisk SSD 8

10 Data MRO Client push (synchronous) or Server pull (asynchronous) Find data type-specific storage medium with available capacity Receive data stream, and compute checksum (CRC) on the fly Data is stored temporarily at the Staging Area Data Archive Plug-In is invoked Quality check / compression Move file to the targeted volume / directory (e.g. MWA-DAPI) Register file in NGAS DB If replication defined, trigger file delivery threads file removal scheduling 128T Ingestion MRO 9

11 1400 Archiving simulation throughput 1200 Total archive throughput - MB/s A - 1Gbps bandwidth for commissioning B - 12 clients / 4 servers on 6 / 2 Fornax nodes C - aggregated data producing rate for 12 clients D - 12 clients / 4 servers on 6 idataplex / 1 Supermicro E - 12 clients / 2 servers on 6 idataplex / 1 Supermicro F - 24 clients / 4 servers on 6 / 1 Fornax nodes G - 24 clients / 4 servers on 24 / 2 Fornax nodes H - 24 clients / 4 servers on 12 / 2 Fornax nodes I - 24 clients / 4 servers on 28 Fornax nodes J - aggregated data producing rate for 24 clients Simulated data rate per client - MB/s 10

12 Archive Servers CPU and File Handling 16MB/s 12 clients 48MB/s 12 clients 56MB/s 24 clients 16MB/s 12 clients 16MB/s 24clients 40MB/s 12 clients 11

13 Data Cortex 32T Distribution of observation Size NGAS DB Science DB API Web Portal 1Gbps à 10Gbps 32T disk archiving rate Staging archive ICRAR, Perth 12

14 7KH WLPH FRQVXPHG WR UHVWRUH WKH GDWD LQ FDVH RI GLVDVWHU 7KH KLJK FRVW RI UHVRXUFHV UHTXLUHG IRU VWRUDJH QHWZRUN WDSH DQG DGPLQLVWUDWLRQ $Q DOWHUQDWLYH LV WR XVH WKH /LE6$0 OLEUDU\ IRU WKH 6XQ 6WRUDJH7HN 6WRUDJH $UFKLYH 0DQDJHU 6$0 D SDUDGLJP IRU GDWD PDQDJHPHQW WKDW JRHV EH\RQG EDFNXS :KDW,V WKH /LE6$0 /LEUDU\" 'HVLJQHG WR XVH ZLWK 6XQ 6WRUDJH7HN 6$0 DQG 6XQ 6WRUDJH7HN 4)6 VRIWZDUH WKH /LE6$0 OLEUDU\ RU $3, DOORZV \RX WR PDQDJH GDWD LQ D VDPIV ILOH V\VWHP IURP ZLWKLQ DQ DSSOLFDWLRQ Data Cortex 128T 7KH PRGHO HPSOR\HG LV FOLHQW VHUYHU $ FOLHQW SURFHVV PDNHV UHTXHVWV WR D VHUYHU SURFHVV 7KH VHUYHU SURFHVVHV WKH UHTXHVWV DQG UHWXUQV WKH SURFHVVLQJ VWDWXV WR WKH FOLHQW,Q WKH VLPSOHVW FDVH DV LV WKH FDVH ZLWK /LE6$0 WKH VHUYHU DQG FOLHQW UXQ RQ WKH VDPH PDFKLQH 7KHUHIRUH DOO UHTXHVWV DUH ORFDO DQG WUDQVODWH LQWR V\VWHP FDOOV WR WKH NHUQHO %DVLF &RQFHSWV DQG,PSOHPHQWDWLRQ %HIRUH WKLV DUWLFOH GHOYHV LQWR WKH GHWDLOV RI KRZ /LE6$0 LV XVHG WR RYHUFRPH WKH OLPLWDWLRQV RI WUDGLWLRQDO EDFNXS PHFKDQLVPV \RX VKRXOG XQGHUVWDQG VRPH EDVLF FRQFHSWV DVVRFLDWHG ZLWK HDFK IXQFWLRQ 7KLV DUWLFOH ZLOO ILUVW GLVFXVV WKH IRXU PDMRU FRPSRQHQWV RI WKH 6XQ 6WRUDJH7HN 6$0 DUFKLYH PDQDJHPHQW V\VWHP $UFKLYLQJ 5HOHDVLQJ 6WDJLQJ 5HF\FOLQJ $UFKLYLQJ Web Portal Distribution of observation Size NGAS DB Science DB API $UFKLYLQJ WKH SURFHVV RI EDFNLQJ XS D ILOH E\ FRS\LQJ LW IURP D ILOH V\VWHP WR DUFKLYH PHGLD LV W\SLFDOO\ WKH ILUVW FRPSRQHQW 7KH DUFKLYH PHGLD FDQ EH D UHPRYDEOH PHGLD FDUWULGJH RU D GLVN SDUWLWLRQ RI DQRWKHU ILOH V\VWHP )LJXUH VKRZV WKH EDVLF FRPSRQHQWV RI WKH DUFKLYLQJ SURFHVV 1Gbps GHYHORSHUV VXQ FRP VRODULV DUWLFOHV OLEVDP KWPO Online archive 128T disk archiving rate 13

15 Data Cortex Client pull (synchronous) or Server push (asynchronous) Pull: Given a file identifier and version, get the file regardless of its location If multiple replica are found, follow: host à cluster à domain à Tapeà mirrored archive Push: Given a list of file identifiers and deliver_to_url, Sort files, then push online files while staging offline ones from the tape Support suspend/resume/cancel and persistency Server-directed I/O à better optimisation (queue mgmt., multiple requests aggregation, etc.) Data Processing Plug-Ins can be invoked prior to file retrieval E.g. decompressing, sub-setting, explicit staging, etc. Get FITS header info à Implicit staging à blocking 32T access stats release 14

16 Access Cortex 32T C118 Fornax A # of accesses Overall First Access. C106 SUN Tracks # of accesses Overall inter-accesses C104 EOR Fields First Access W44 Galactic plane field inter-accesses 15

17 Access Cortex 32T 16

18 Data Mirroring Publish / Subscribe mode Criteria à Subscription Filter Plugin Events File ingestion, subscribe command, explicit triggering, etc. Subscribe into the past Each subscriber is given a queue Each subscriber / queue is assigned a priority Transfer HTTP / FTP / GridFTP (concurrent transfer, file pipeline, parallel connection) Target can be anything that supports HTTP / FTP / GridFTP E.g. python -m SimpleHTTPServer Failed deliveries will be recorded in the back log They get re-sent through explicit triggering when connections become normal Support proxy mode Bypass slow hops In-transit data processing 17

19 Mirrored MIT 12 streams - Overall rate: / MB/s Cortex TCP window size: 16.0 MB, RTT ~0.281 secs Max send buffer: 128KB MB/s (93.35Mb/s) Max send buffer: 64MB MB/s (137.19Mb/s), 47% higher 18

20 Data Re-processing (WIP) Requirements Stage data from Cortex to Fornax Process calibration, imaging pipelines with data placed on Fornax Archive calibrated data / image cubes back to Cortex (Optional, not considered for 128T) Data staging strategies (when, how much, how frequent) Decoupled with Processing à no overlapping Staging is submitted as another job by users prior to submitting processing jobs Loosely-coupled with Processing à some overlapping Staging is a task inside the same job, can be asynchronous Tightly-coupled Processing à intertwined Processing and data staging in coordination Data placement strategies (where, when to cleanup / update) Loosely-coupled with Processing à Global file system (i.e. Lustre) User-friendly abstraction POXIS API, easiest for developers Tightly-coupled with Processing à Application-optimised file system Expose file block location in file metadata, E.g. Google File System, HDFS Dynamically-coupled with Processing à Temporary data storage and pipeline co-design Cache storage Multi-level Scheduling Stage Process Place 19

21 &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +* &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +* &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +* &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +* &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +* Data Re-processing on Fornax Uniqueness of Fornax for MWA GPU Cluster à MWA Real-Time System Local storage (7TB/node x 96 node) à RTS compute storage à in-storage processing Dual Infiniband connections à RTS data movement Therefore, MWA has applied 25% time on Fornax Fornax local storage is treated as a cache for MWA SRAM DRAM! " #$$#!%"&! " #$$#!%"& Frequency splits Fornax Local Storage! " #$$#!%"& Cortex Disk Cache Visibility files! " #$$#!%"&! " #$$#!%"& Cortex Tape Libraries 20

22 Fornax block diagram!"#$%&'()*+,-.(/#012,-0,3#-(45-#52-6!)11&!)11'!)11.!)11- (045 (+:9,19 /),+;)6 Fornax Architecture: Chris Harris "82$(9":-+ ()*+,$ ()*+,# ;"<*(9":-+ +)$ +)# EHF G,1-#$-, G&,-#$%> ;".<3,-(9":-+!""$!""#!"%&!"%'!"#$%&'()"*+,-.(/"0.(1# ,#.(567819:!/01#! B$C2$2D%$: EF(G,1-#$-,! =-5->"<.-$,(9":-+!"%.!"%-!"%%!$"" 73+,#-(9":-+!/01$!)11$!)11#!)112!)113!"J1"18>"K /Q(9/(A823 BC3#$D E$D'?C 9*AF ::G9 ;H'I1<(!"J1"18>"K(!FLM9//(;:G ;H'I1<(!"J1"18>"K(!FLM9//(;:G /)*(A823 -,!$/(QR 9/(A823 -,!$/(QR 9/(A823!"#$% &$'"(&)*)+,-.!"#$% ))/+(,0123$#!45(678(!569*:!"#$% &$'"(&)*)+,-. /)*(A823 BC3#$D E$D'?C 9*AF ::G9!"#$% ))/+(,0123$#!45(678(!569*: NO!:!L(P$3%>(,/+M)(A-. *S(A823 RVS(A823 -,!$/(QS 9/(A823 BLB4BLPL(,'"#?'%%$? HB!(!BL/++R 21

23 Processing-optimised NGAS Cache Storage!"#$%&'()*+,-.(/#012,-0,3#-(45-#52-6 Long-term archive A-%:(9":Mirrored archive!)6,478<, (045 (+:9,19 Async staging /),+;)6 GEG 2/3 7"82$(9":-+ ;"<*(9":-+ Mirrored archive ()*+,$ EHF G,1-#$-, G&,-#$%> ()*+,# ;".<3,-(9":-+!""$ +)$ 3/4!""#!"%&!"%- EOR!"%' 1 =-5->"<.-$,(9":-+!"%. +)#!"%% 2/3!$"" 73+,#-(9":-+! B$C2$2D%$: EF(G,1-#$-,!/01$!)11$!)11#!)112!)113!/01#!)11&!)11'!)11.!)11-! 22

24 Conclusion Requirements MWA Data ingestion, transferring, storage, access, staging, re-processing NGAS Open (LGPL) software deployed in astronomy archive facilities around the globe NGAS tailored for MWA fulfills the above requirements, meets the MWA data challenge just access whole observations whenever I like, without going to file cabinets getting tapes, mounting and then trying to figure out what to do an MWA Super Science Fellow The NGAS-based MWA data archive solution is working fine during commissioning Fornax The major re-processing power horse and data staging hotspot for MWA One of few data-intensive clusters suited for MWA data reprocessing What s Next à Science archive Data modeling: ObsCore, Data access: VO Tools: ObsTAP, Saada, OpenCADC 23

The Murchison Widefield Array Data Archive System. Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia

The Murchison Widefield Array Data Archive System. Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia The Murchison Widefield Array Data Archive System Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia Agenda Dataflow Requirements Solutions & Lessons learnt Open solution

More information

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data

More information

Diagram 1: Islands of storage across a digital broadcast workflow

Diagram 1: Islands of storage across a digital broadcast workflow XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

www.thinkparq.com www.beegfs.com

www.thinkparq.com www.beegfs.com www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

THE HADOOP DISTRIBUTED FILE SYSTEM

THE HADOOP DISTRIBUTED FILE SYSTEM THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,

More information

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud An Oracle White Paper July 2011 Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud Executive Summary... 3 Introduction... 4 Hardware and Software Overview... 5 Compute Node... 5 Storage

More information

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack May 2015 Copyright 2015 SwiftStack, Inc. swiftstack.com Page 1 of 19 Table of Contents INTRODUCTION... 3 OpenStack

More information

Data Movement and Storage. Drew Dolgert and previous contributors

Data Movement and Storage. Drew Dolgert and previous contributors Data Movement and Storage Drew Dolgert and previous contributors Data Intensive Computing Location Viewing Manipulation Storage Movement Sharing Interpretation $HOME $WORK $SCRATCH 72 is a Lot, Right?

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

PARALLELS CLOUD STORAGE

PARALLELS CLOUD STORAGE PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...

More information

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT SS Data & Storage CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT HEPiX Fall 2012 Workshop October 15-19, 2012 Institute of High Energy Physics, Beijing, China SS Outline

More information

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University

More information

(WKHUQHW 3& &DUG 0RGHOV '(0993&7 '(09937 '(0993/7

(WKHUQHW 3& &DUG 0RGHOV '(0993&7 '(09937 '(0993/7 (WKHUQHW 3& &DUG 0RGHOV '(0993&7 '(09937 '(0993/7 8VHU V *XLGH Rev. 08w (October 2004) 9'(099301110136 3ULQWHG LQ 7DLZDQ 5(&

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB

More information

Integrating VoltDB with Hadoop

Integrating VoltDB with Hadoop The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.

More information

POSIX and Object Distributed Storage Systems

POSIX and Object Distributed Storage Systems 1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome

More information

The Google File System

The Google File System The Google File System By Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Presented at SOSP 2003) Introduction Google search engine. Applications process lots of data. Need good file system. Solution:

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume

More information

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS)

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) Jessica Chapman, Data Workshop March 2013 ASKAP Science Data Archive Talk outline Data flow in brief Some radio

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A

More information

Redefining Microsoft SQL Server Data Management. PAS Specification

Redefining Microsoft SQL Server Data Management. PAS Specification Redefining Microsoft SQL Server Data Management APRIL Actifio 11, 2013 PAS Specification Table of Contents Introduction.... 3 Background.... 3 Virtualizing Microsoft SQL Server Data Management.... 4 Virtualizing

More information

Bigdata High Availability (HA) Architecture

Bigdata High Availability (HA) Architecture Bigdata High Availability (HA) Architecture Introduction This whitepaper describes an HA architecture based on a shared nothing design. Each node uses commodity hardware and has its own local resources

More information

HDFS Architecture Guide

HDFS Architecture Guide by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5

More information

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything BlueArc unified network storage systems 7th TF-Storage Meeting Scale Bigger, Store Smarter, Accelerate Everything BlueArc s Heritage Private Company, founded in 1998 Headquarters in San Jose, CA Highest

More information

The Hadoop Distributed File System

The Hadoop Distributed File System The Hadoop Distributed File System The Hadoop Distributed File System, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo, 2010 Agenda Topic 1: Introduction Topic 2: Architecture

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

APPENDIX H. CONCEPT DEVELOPMENT

APPENDIX H. CONCEPT DEVELOPMENT APPENDIX H. CONCEPT DEVELOPMENT +,17(502'$/75$163257$7,21 &(17(5237,216 7KH,7& LOOXVWUDWHG LQ)LJXUHV+WKURXJK+ LV LQWHQGHG WR VHUYH DV WKH SUHPLHU VKRUWWHUPEXVLQHVV WUDYHOHU SDUNLQJ RSWLRQ IRUWKHDLUSRUW7KH,7&DOVRVHUYHVDVWKHDLUSRUW

More information

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

GraySort on Apache Spark by Databricks

GraySort on Apache Spark by Databricks GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner

More information

Big data management with IBM General Parallel File System

Big data management with IBM General Parallel File System Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers

More information

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE DSS Data & Diskpool and cloud storage benchmarks used in IT-DSS CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Geoffray ADDE DSS Outline I- A rational approach to storage systems evaluation

More information

SMB Direct for SQL Server and Private Cloud

SMB Direct for SQL Server and Private Cloud SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server

More information

Taking Big Data to the Cloud. Enabling cloud computing & storage for big data applications with on-demand, high-speed transport WHITE PAPER

Taking Big Data to the Cloud. Enabling cloud computing & storage for big data applications with on-demand, high-speed transport WHITE PAPER Taking Big Data to the Cloud WHITE PAPER TABLE OF CONTENTS Introduction 2 The Cloud Promise 3 The Big Data Challenge 3 Aspera Solution 4 Delivering on the Promise 4 HIGHLIGHTS Challenges Transporting large

More information

Quantum StorNext. Product Brief: Distributed LAN Client

Quantum StorNext. Product Brief: Distributed LAN Client Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without

More information

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data David Minor 1, Reagan Moore 2, Bing Zhu, Charles Cowart 4 1. (88)4-104 minor@sdsc.edu San Diego Supercomputer Center

More information

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

Performance and scalability of a large OLTP workload

Performance and scalability of a large OLTP workload Performance and scalability of a large OLTP workload ii Performance and scalability of a large OLTP workload Contents Performance and scalability of a large OLTP workload with DB2 9 for System z on Linux..............

More information

Storage Virtualization. Andreas Joachim Peters CERN IT-DSS

Storage Virtualization. Andreas Joachim Peters CERN IT-DSS Storage Virtualization Andreas Joachim Peters CERN IT-DSS Outline What is storage virtualization? Commercial and non-commercial tools/solutions Local and global storage virtualization Scope of this presentation

More information

2009 Oracle Corporation 1

2009 Oracle Corporation 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

Archive Storage Infrastructure At the Library of Congress September 2015

Archive Storage Infrastructure At the Library of Congress September 2015 Infrastructure At the Library of Congress September 2015 http://www.loc.gov/avconservation/packard/ The Packard Campus Mission The National Audiovisual Conservation Center develops, preserves and provides

More information

Client-aware Cloud Storage

Client-aware Cloud Storage Client-aware Cloud Storage Feng Chen Computer Science & Engineering Louisiana State University Michael Mesnier Circuits & Systems Research Intel Labs Scott Hahn Circuits & Systems Research Intel Labs Cloud

More information

XenData Product Brief: SX-550 Series Servers for LTO Archives

XenData Product Brief: SX-550 Series Servers for LTO Archives XenData Product Brief: SX-550 Series Servers for LTO Archives The SX-550 Series of Archive Servers creates highly scalable LTO Digital Video Archives that are optimized for broadcasters, video production

More information

StorReduce Technical White Paper Cloud-based Data Deduplication

StorReduce Technical White Paper Cloud-based Data Deduplication StorReduce Technical White Paper Cloud-based Data Deduplication See also at storreduce.com/docs StorReduce Quick Start Guide StorReduce FAQ StorReduce Solution Brief, and StorReduce Blog at storreduce.com/blog

More information

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? Ashutosh Shinde Performance Architect ashutosh_shinde@hotmail.com Validating if the workload generated by the load generating tools is applied

More information

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014 Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet Anand Rangaswamy September 2014 Storage Developer Conference Mellanox Overview Ticker: MLNX Leading provider of high-throughput,

More information

Running a Workflow on a PowerCenter Grid

Running a Workflow on a PowerCenter Grid Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

Distributed Database Access in the LHC Computing Grid with CORAL

Distributed Database Access in the LHC Computing Grid with CORAL Distributed Database Access in the LHC Computing Grid with CORAL Dirk Duellmann, CERN IT on behalf of the CORAL team (R. Chytracek, D. Duellmann, G. Govi, I. Papadopoulos, Z. Xie) http://pool.cern.ch &

More information

Large File System Backup NERSC Global File System Experience

Large File System Backup NERSC Global File System Experience Large File System Backup NERSC Global File System Experience M. Andrews, J. Hick, W. Kramer, A. Mokhtarani National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory

More information

Maurice Askinazi Ofer Rind Tony Wong. HEPIX @ Cornell Nov. 2, 2010 Storage at BNL

Maurice Askinazi Ofer Rind Tony Wong. HEPIX @ Cornell Nov. 2, 2010 Storage at BNL Maurice Askinazi Ofer Rind Tony Wong HEPIX @ Cornell Nov. 2, 2010 Storage at BNL Traditional Storage Dedicated compute nodes and NFS SAN storage Simple and effective, but SAN storage became very expensive

More information

VMware vrealize Automation

VMware vrealize Automation VMware vrealize Automation Reference Architecture Version 6.0 and Higher T E C H N I C A L W H I T E P A P E R Table of Contents Overview... 4 What s New... 4 Initial Deployment Recommendations... 4 General

More information

Graylog2 Lennart Koopmann, OSDC 2014. @_lennart / www.graylog2.org

Graylog2 Lennart Koopmann, OSDC 2014. @_lennart / www.graylog2.org Graylog2 Lennart Koopmann, OSDC 2014 @_lennart / www.graylog2.org About me 25 years old Living in Hamburg, Germany @_lennart on Twitter Co-Founder of TORCH - The Graylog2 company. Graylog2 history Started

More information

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359

More information

Managing your Red Hat Enterprise Linux guests with RHN Satellite

Managing your Red Hat Enterprise Linux guests with RHN Satellite Managing your Red Hat Enterprise Linux guests with RHN Satellite Matthew Davis, Level 1 Production Support Manager, Red Hat Brad Hinson, Sr. Support Engineer Lead System z, Red Hat Mark Spencer, Sr. Solutions

More information

This talk is mostly about Data Center Replication, but along the way we'll have to talk about why you'd want transactionality arnd the Low-Level API.

This talk is mostly about Data Center Replication, but along the way we'll have to talk about why you'd want transactionality arnd the Low-Level API. This talk is mostly about Data Center Replication, but along the way we'll have to talk about why you'd want transactionality arnd the Low-Level API. Roughly speaking, the yellow boxes here represenet

More information

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) ( TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

Redefining Microsoft Exchange Data Management

Redefining Microsoft Exchange Data Management Redefining Microsoft Exchange Data Management FEBBRUARY, 2013 Actifio PAS Specification Table of Contents Introduction.... 3 Background.... 3 Virtualizing Microsoft Exchange Data Management.... 3 Virtualizing

More information

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand Hari Subramoni *, Ping Lai *, Raj Kettimuthu **, Dhabaleswar. K. (DK) Panda * * Computer Science and Engineering Department

More information

Eloquence Training What s new in Eloquence B.08.00

Eloquence Training What s new in Eloquence B.08.00 Eloquence Training What s new in Eloquence B.08.00 2010 Marxmeier Software AG Rev:100727 Overview Released December 2008 Supported until November 2013 Supports 32-bit and 64-bit platforms HP-UX Itanium

More information

(Scale Out NAS System)

(Scale Out NAS System) For Unlimited Capacity & Performance Clustered NAS System (Scale Out NAS System) Copyright 2010 by Netclips, Ltd. All rights reserved -0- 1 2 3 4 5 NAS Storage Trend Scale-Out NAS Solution Scaleway Advantages

More information

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group DSS High performance storage pools for LHC Łukasz Janyst on behalf of the CERN IT-DSS group CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Introduction The goal of EOS is to provide a

More information

Optimising NGAS for the MWA Archive

Optimising NGAS for the MWA Archive Noname manuscript No. (will be inserted by the editor) Optimising NGAS for the MWA Archive C Wu A Wicenec D Pallot A Checcucci Received: date / Accepted: date Abstract The Murchison Widefield Array (MWA)

More information

MiaRec. Architecture for SIPREC recording

MiaRec. Architecture for SIPREC recording Architecture for SIPREC recording Table of Contents 1 Overview... 3 2 Architecture... 4 3 Third-party application integration... 6 3.1 REST API... 6 3.2 Direct access to MiaRec resources... 7 4 High availability

More information

Content Distribution Management

Content Distribution Management Digitizing the Olympics was truly one of the most ambitious media projects in history, and we could not have done it without Signiant. We used Signiant CDM to automate 54 different workflows between 11

More information

The glite File Transfer Service

The glite File Transfer Service The glite File Transfer Service Peter Kunszt Paolo Badino Ricardo Brito da Rocha James Casey Ákos Frohner Gavin McCance CERN, IT Department 1211 Geneva 23, Switzerland Abstract Transferring data reliably

More information

Designing a Cloud Storage System

Designing a Cloud Storage System Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes

More information

CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS

CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS Java EE Components Java EE Vendor Specifications Containers Java EE Blueprint Services JDBC Data Sources Java Naming and Directory Interface Java Message

More information

IBM WebSphere Distributed Caching Products

IBM WebSphere Distributed Caching Products extreme Scale, DataPower XC10 IBM Distributed Caching Products IBM extreme Scale v 7.1 and DataPower XC10 Appliance Highlights A powerful, scalable, elastic inmemory grid for your business-critical applications

More information

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Hadoop Distributed File System. Dhruba Borthakur June, 2007 Hadoop Distributed File System Dhruba Borthakur June, 2007 Goals of HDFS Very Large Distributed File System 10K nodes, 100 million files, 10 PB Assumes Commodity Hardware Files are replicated to handle

More information

Performance Analysis of Mixed Distributed Filesystem Workloads

Performance Analysis of Mixed Distributed Filesystem Workloads Performance Analysis of Mixed Distributed Filesystem Workloads Esteban Molina-Estolano, Maya Gokhale, Carlos Maltzahn, John May, John Bent, Scott Brandt Motivation Hadoop-tailored filesystems (e.g. CloudStore)

More information

ArcGIS for Server: Administrative Scripting and Automation

ArcGIS for Server: Administrative Scripting and Automation ArcGIS for Server: Administrative Scripting and Automation Shreyas Shinde Ranjit Iyer Esri UC 2014 Technical Workshop Agenda Introduction to server administration Command line tools ArcGIS Server Manager

More information

IBM Content Collector Deployment and Performance Tuning

IBM Content Collector Deployment and Performance Tuning Redpaper Wei-Dong Zhu Markus Lorch IBM Content Collector Deployment and Performance Tuning Overview This IBM Redpaper publication explains the key areas that need to be considered when planning for IBM

More information

Couchbase Server Under the Hood

Couchbase Server Under the Hood Couchbase Server Under the Hood An Architectural Overview Couchbase Server is an open-source distributed NoSQL document-oriented database for interactive applications, uniquely suited for those needing

More information

Oracle WebLogic Server 11g Administration

Oracle WebLogic Server 11g Administration Oracle WebLogic Server 11g Administration This course is designed to provide instruction and hands-on practice in installing and configuring Oracle WebLogic Server 11g. These tasks include starting and

More information

Cray DVS: Data Virtualization Service

Cray DVS: Data Virtualization Service Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with

More information

POWER ALL GLOBAL FILE SYSTEM (PGFS)

POWER ALL GLOBAL FILE SYSTEM (PGFS) POWER ALL GLOBAL FILE SYSTEM (PGFS) Defining next generation of global storage grid Power All Networks Ltd. Technical Whitepaper April 2008, version 1.01 Table of Content 1. Introduction.. 3 2. Paradigm

More information

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013 SAP HANA SAP s In-Memory Database Dr. Martin Kittel, SAP HANA Development January 16, 2013 Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase

More information

Big Data Visualization with JReport

Big Data Visualization with JReport Big Data Visualization with JReport Dean Yao Director of Marketing Greg Harris Systems Engineer Next Generation BI Visualization JReport is an advanced BI visualization platform: Faster, scalable reports,

More information

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data

More information

HADOOP PERFORMANCE TUNING

HADOOP PERFORMANCE TUNING PERFORMANCE TUNING Abstract This paper explains tuning of Hadoop configuration parameters which directly affects Map-Reduce job performance under various conditions, to achieve maximum performance. The

More information

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC

More information

Database Monitoring Requirements. Salvatore Di Guida (CERN) On behalf of the CMS DB group

Database Monitoring Requirements. Salvatore Di Guida (CERN) On behalf of the CMS DB group Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group Outline CMS Database infrastructure and data flow. Data access patterns. Requirements coming from the hardware and

More information

A Web Services Data Analysis Grid *

A Web Services Data Analysis Grid * A Web Services Data Analysis Grid * William A. Watson III, Ian Bird, Jie Chen, Bryan Hess, Andy Kowalski, Ying Chen Thomas Jefferson National Accelerator Facility 12000 Jefferson Av, Newport News, VA 23606,

More information

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects

More information

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance M. Rangarajan, A. Bohra, K. Banerjee, E.V. Carrera, R. Bianchini, L. Iftode, W. Zwaenepoel. Presented

More information

VMware vrealize Automation

VMware vrealize Automation VMware vrealize Automation Reference Architecture Version 6.0 or Later T E C H N I C A L W H I T E P A P E R J U N E 2 0 1 5 V E R S I O N 1. 5 Table of Contents Overview... 4 What s New... 4 Initial Deployment

More information

Open Text Archive Server and Microsoft Windows Azure Storage

Open Text Archive Server and Microsoft Windows Azure Storage Open Text Archive Server and Microsoft Windows Azure Storage Whitepaper Open Text December 23nd, 2009 2 Microsoft W indows Azure Platform W hite Paper Contents Executive Summary / Introduction... 4 Overview...

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH CONTENTS Introduction... 4 System Components... 4 OpenNebula Cloud Management Toolkit... 4 VMware

More information

BEST PRACTICES FOR INTEGRATING TELESTREAM VANTAGE WITH EMC ISILON ONEFS

BEST PRACTICES FOR INTEGRATING TELESTREAM VANTAGE WITH EMC ISILON ONEFS Best Practices Guide BEST PRACTICES FOR INTEGRATING TELESTREAM VANTAGE WITH EMC ISILON ONEFS Abstract This best practices guide contains details for integrating Telestream Vantage workflow design and automation

More information