MWA Archive A multi-tiered dataflow and storage system based on NGAS
|
|
|
- Samuel Booker
- 10 years ago
- Views:
Transcription
1 MWA Archive A multi-tiered dataflow and storage system based on NGAS Chen Wu Team: Dave Pallot, Andreas Wicenec, Chen Wu
2 Agenda Dataflow overview High-level requirements Next Generation Archive System (NGAS) Additional development to tailor NGAS for MWA Meet requirements Data capturing Data ingestion Data storage Data access Data mirroring Work-in-progress Data re-processing Conclusion 1
3 Online archive 32T Dataflow Online processing 320Gb/s 1/10Gbps ~ 400 MB/s 1 Gbps Staging archive MRO Mirrored archive AARNet ICRAR, Perth MIT, USA 10Gbps 10Gbps VO Long-term archive Staging & processing Data distribution, placement, scheduling Web 10Gbps Research PBStore, Perth Fornax, Perth Scripting UI Catalog VO-Table VO-Table 2
4 Online archive 128T Dataflow Online processing 320Gb/s MRO ~ 400 MB/s 1 Gbps 1/10 Gbps Proxy Archive 1Gbps Mirrored archive AARNet ICRAR, Perth MIT, USA VO Long-term archive Staging & processing Data distribution, placement, scheduling Web 10Gbps Research PBStore, Perth Fornax, Perth Scripting UI Catalog VO-Table VO-Table 3
5 Conceptual Dataflow Tier 0 metadata Online Archive RF data stream Online Processing control monitor Further Further processing Processing Further Further processing Processing Further Further processing Processing Tier 2 Proxy Archive Tier 1 Mirrored Mirrored Archive Archive Mirrored Mirrored Archive Archive Mirrored Mirrored Archive Archive Long-Term Archive Offline Processing 4
6 High-level Requirements High throughput data ingestion ~400MB/s visibility, MB/s vis + image cube Efficient data distribution to multiple locations Australia / New Zealand / USA / India Secure and cost-effective storage of 8 10 TB of data collected daily Fast access to science archive by astronomers from 3 continents Intensive re-processing of archived data on GPU clusters calibration, imaging, etc. Continuous growth data volume (3PB/year), data variety (visibility, image cube, catalogue FITS, CASA, HDF5, JPEG2000) environment (Cortex à Pawsey) 5
7 Next Generation Archive System NGAS Openness Open source (L-GPL) originated from ESO for data archiving by Knudstrup and Wicenec (2000) Used by the astronomy community VLT, & La Silla, ALMA, evla, MWA ALMA: All 5 ALMA sites are based on NGAS, including all data transfers, mirroring from the observatory to all other sites across the globe. Full high availability setup with automatic fail-over Plugin-based - almost every feature is implemented as a plugin plugged into a light kernel Loss-free operation consistency checking, replication, fault-tolerance (e.g. power outage, disk failure, etc.) Object-based data storage Span multiple file systems distributed across multiple sites (e.g. VLT > 100 million objects) native HTTP interface (POST, GET, PUT, etc.) and location-transparent (Web admin UI) Scalable architecture Horizontal scalability add another NGAS instance In-storage processing Compute inside the archive Low cost, hardware-neutral deployment, versatility Linux machine + Disk arrays Support mobile storage media with energy efficiency Ingestion, Staging, Long-term Archive, Temporary storage, Proxy, Processing, etc. 6
8 Additional efforts needed for MWA High-throughput data ingestion 66 MB/s (ALMA) vs. 400 MB/s (MWA) Completely integrated systems vs. Theoretical performance Efficient dataflow control Multiple mirrored archives each with distinct subscription rules Saturate WAN bandwidth In-transit data processing Multi-tiered data storage Work well with HSM, e.g. retrieve a file from Tape à HTTP GET timeout Content-based access pattern classification for placement optimisation Workflow-optimised data staging to avoid I/O contention Staging data from long-term archive to Fornax (time and placement) Tasks in a complex workflow exchange data through a shared file system (But can we exploit local storage as well?) 7
9 Data MRO Wide Area Network NG/AMS Remote NGAS Servers Servers Web / Query / VO Interface Archive DB Metadata Network M&C Subsystem Local NG/AMS NGAS Servers Servers Data Network Data Producer DataCaptureMgr M&C Interface A Single Process Plugin-based Multi-threaded Throughput-oriented Fault-tolerant Admission control File handling In-memory Buffer DataHandler DataHandler Staging Area NGAS Client HDD Ramdisk SSD 8
10 Data MRO Client push (synchronous) or Server pull (asynchronous) Find data type-specific storage medium with available capacity Receive data stream, and compute checksum (CRC) on the fly Data is stored temporarily at the Staging Area Data Archive Plug-In is invoked Quality check / compression Move file to the targeted volume / directory (e.g. MWA-DAPI) Register file in NGAS DB If replication defined, trigger file delivery threads file removal scheduling 128T Ingestion MRO 9
11 1400 Archiving simulation throughput 1200 Total archive throughput - MB/s A - 1Gbps bandwidth for commissioning B - 12 clients / 4 servers on 6 / 2 Fornax nodes C - aggregated data producing rate for 12 clients D - 12 clients / 4 servers on 6 idataplex / 1 Supermicro E - 12 clients / 2 servers on 6 idataplex / 1 Supermicro F - 24 clients / 4 servers on 6 / 1 Fornax nodes G - 24 clients / 4 servers on 24 / 2 Fornax nodes H - 24 clients / 4 servers on 12 / 2 Fornax nodes I - 24 clients / 4 servers on 28 Fornax nodes J - aggregated data producing rate for 24 clients Simulated data rate per client - MB/s 10
12 Archive Servers CPU and File Handling 16MB/s 12 clients 48MB/s 12 clients 56MB/s 24 clients 16MB/s 12 clients 16MB/s 24clients 40MB/s 12 clients 11
13 Data Cortex 32T Distribution of observation Size NGAS DB Science DB API Web Portal 1Gbps à 10Gbps 32T disk archiving rate Staging archive ICRAR, Perth 12
14 7KH WLPH FRQVXPHG WR UHVWRUH WKH GDWD LQ FDVH RI GLVDVWHU 7KH KLJK FRVW RI UHVRXUFHV UHTXLUHG IRU VWRUDJH QHWZRUN WDSH DQG DGPLQLVWUDWLRQ $Q DOWHUQDWLYH LV WR XVH WKH /LE6$0 OLEUDU\ IRU WKH 6XQ 6WRUDJH7HN 6WRUDJH $UFKLYH 0DQDJHU 6$0 D SDUDGLJP IRU GDWD PDQDJHPHQW WKDW JRHV EH\RQG EDFNXS :KDW,V WKH /LE6$0 /LEUDU\" 'HVLJQHG WR XVH ZLWK 6XQ 6WRUDJH7HN 6$0 DQG 6XQ 6WRUDJH7HN 4)6 VRIWZDUH WKH /LE6$0 OLEUDU\ RU $3, DOORZV \RX WR PDQDJH GDWD LQ D VDPIV ILOH V\VWHP IURP ZLWKLQ DQ DSSOLFDWLRQ Data Cortex 128T 7KH PRGHO HPSOR\HG LV FOLHQW VHUYHU $ FOLHQW SURFHVV PDNHV UHTXHVWV WR D VHUYHU SURFHVV 7KH VHUYHU SURFHVVHV WKH UHTXHVWV DQG UHWXUQV WKH SURFHVVLQJ VWDWXV WR WKH FOLHQW,Q WKH VLPSOHVW FDVH DV LV WKH FDVH ZLWK /LE6$0 WKH VHUYHU DQG FOLHQW UXQ RQ WKH VDPH PDFKLQH 7KHUHIRUH DOO UHTXHVWV DUH ORFDO DQG WUDQVODWH LQWR V\VWHP FDOOV WR WKH NHUQHO %DVLF &RQFHSWV DQG,PSOHPHQWDWLRQ %HIRUH WKLV DUWLFOH GHOYHV LQWR WKH GHWDLOV RI KRZ /LE6$0 LV XVHG WR RYHUFRPH WKH OLPLWDWLRQV RI WUDGLWLRQDO EDFNXS PHFKDQLVPV \RX VKRXOG XQGHUVWDQG VRPH EDVLF FRQFHSWV DVVRFLDWHG ZLWK HDFK IXQFWLRQ 7KLV DUWLFOH ZLOO ILUVW GLVFXVV WKH IRXU PDMRU FRPSRQHQWV RI WKH 6XQ 6WRUDJH7HN 6$0 DUFKLYH PDQDJHPHQW V\VWHP $UFKLYLQJ 5HOHDVLQJ 6WDJLQJ 5HF\FOLQJ $UFKLYLQJ Web Portal Distribution of observation Size NGAS DB Science DB API $UFKLYLQJ WKH SURFHVV RI EDFNLQJ XS D ILOH E\ FRS\LQJ LW IURP D ILOH V\VWHP WR DUFKLYH PHGLD LV W\SLFDOO\ WKH ILUVW FRPSRQHQW 7KH DUFKLYH PHGLD FDQ EH D UHPRYDEOH PHGLD FDUWULGJH RU D GLVN SDUWLWLRQ RI DQRWKHU ILOH V\VWHP )LJXUH VKRZV WKH EDVLF FRPSRQHQWV RI WKH DUFKLYLQJ SURFHVV 1Gbps GHYHORSHUV VXQ FRP VRODULV DUWLFOHV OLEVDP KWPO Online archive 128T disk archiving rate 13
15 Data Cortex Client pull (synchronous) or Server push (asynchronous) Pull: Given a file identifier and version, get the file regardless of its location If multiple replica are found, follow: host à cluster à domain à Tapeà mirrored archive Push: Given a list of file identifiers and deliver_to_url, Sort files, then push online files while staging offline ones from the tape Support suspend/resume/cancel and persistency Server-directed I/O à better optimisation (queue mgmt., multiple requests aggregation, etc.) Data Processing Plug-Ins can be invoked prior to file retrieval E.g. decompressing, sub-setting, explicit staging, etc. Get FITS header info à Implicit staging à blocking 32T access stats release 14
16 Access Cortex 32T C118 Fornax A # of accesses Overall First Access. C106 SUN Tracks # of accesses Overall inter-accesses C104 EOR Fields First Access W44 Galactic plane field inter-accesses 15
17 Access Cortex 32T 16
18 Data Mirroring Publish / Subscribe mode Criteria à Subscription Filter Plugin Events File ingestion, subscribe command, explicit triggering, etc. Subscribe into the past Each subscriber is given a queue Each subscriber / queue is assigned a priority Transfer HTTP / FTP / GridFTP (concurrent transfer, file pipeline, parallel connection) Target can be anything that supports HTTP / FTP / GridFTP E.g. python -m SimpleHTTPServer Failed deliveries will be recorded in the back log They get re-sent through explicit triggering when connections become normal Support proxy mode Bypass slow hops In-transit data processing 17
19 Mirrored MIT 12 streams - Overall rate: / MB/s Cortex TCP window size: 16.0 MB, RTT ~0.281 secs Max send buffer: 128KB MB/s (93.35Mb/s) Max send buffer: 64MB MB/s (137.19Mb/s), 47% higher 18
20 Data Re-processing (WIP) Requirements Stage data from Cortex to Fornax Process calibration, imaging pipelines with data placed on Fornax Archive calibrated data / image cubes back to Cortex (Optional, not considered for 128T) Data staging strategies (when, how much, how frequent) Decoupled with Processing à no overlapping Staging is submitted as another job by users prior to submitting processing jobs Loosely-coupled with Processing à some overlapping Staging is a task inside the same job, can be asynchronous Tightly-coupled Processing à intertwined Processing and data staging in coordination Data placement strategies (where, when to cleanup / update) Loosely-coupled with Processing à Global file system (i.e. Lustre) User-friendly abstraction POXIS API, easiest for developers Tightly-coupled with Processing à Application-optimised file system Expose file block location in file metadata, E.g. Google File System, HDFS Dynamically-coupled with Processing à Temporary data storage and pipeline co-design Cache storage Multi-level Scheduling Stage Process Place 19
21 &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +* &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +* &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +* &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +* &' &( )% )! )" )* )& )) )+ ), )' )( +% +! +" +* Data Re-processing on Fornax Uniqueness of Fornax for MWA GPU Cluster à MWA Real-Time System Local storage (7TB/node x 96 node) à RTS compute storage à in-storage processing Dual Infiniband connections à RTS data movement Therefore, MWA has applied 25% time on Fornax Fornax local storage is treated as a cache for MWA SRAM DRAM! " #$$#!%"&! " #$$#!%"& Frequency splits Fornax Local Storage! " #$$#!%"& Cortex Disk Cache Visibility files! " #$$#!%"&! " #$$#!%"& Cortex Tape Libraries 20
22 Fornax block diagram!"#$%&'()*+,-.(/#012,-0,3#-(45-#52-6!)11&!)11'!)11.!)11- (045 (+:9,19 /),+;)6 Fornax Architecture: Chris Harris "82$(9":-+ ()*+,$ ()*+,# ;"<*(9":-+ +)$ +)# EHF G,1-#$-, G&,-#$%> ;".<3,-(9":-+!""$!""#!"%&!"%'!"#$%&'()"*+,-.(/"0.(1# ,#.(567819:!/01#! B$C2$2D%$: EF(G,1-#$-,! =-5->"<.-$,(9":-+!"%.!"%-!"%%!$"" 73+,#-(9":-+!/01$!)11$!)11#!)112!)113!"J1"18>"K /Q(9/(A823 BC3#$D E$D'?C 9*AF ::G9 ;H'I1<(!"J1"18>"K(!FLM9//(;:G ;H'I1<(!"J1"18>"K(!FLM9//(;:G /)*(A823 -,!$/(QR 9/(A823 -,!$/(QR 9/(A823!"#$% &$'"(&)*)+,-.!"#$% ))/+(,0123$#!45(678(!569*:!"#$% &$'"(&)*)+,-. /)*(A823 BC3#$D E$D'?C 9*AF ::G9!"#$% ))/+(,0123$#!45(678(!569*: NO!:!L(P$3%>(,/+M)(A-. *S(A823 RVS(A823 -,!$/(QS 9/(A823 BLB4BLPL(,'"#?'%%$? HB!(!BL/++R 21
23 Processing-optimised NGAS Cache Storage!"#$%&'()*+,-.(/#012,-0,3#-(45-#52-6 Long-term archive A-%:(9":Mirrored archive!)6,478<, (045 (+:9,19 Async staging /),+;)6 GEG 2/3 7"82$(9":-+ ;"<*(9":-+ Mirrored archive ()*+,$ EHF G,1-#$-, G&,-#$%> ()*+,# ;".<3,-(9":-+!""$ +)$ 3/4!""#!"%&!"%- EOR!"%' 1 =-5->"<.-$,(9":-+!"%. +)#!"%% 2/3!$"" 73+,#-(9":-+! B$C2$2D%$: EF(G,1-#$-,!/01$!)11$!)11#!)112!)113!/01#!)11&!)11'!)11.!)11-! 22
24 Conclusion Requirements MWA Data ingestion, transferring, storage, access, staging, re-processing NGAS Open (LGPL) software deployed in astronomy archive facilities around the globe NGAS tailored for MWA fulfills the above requirements, meets the MWA data challenge just access whole observations whenever I like, without going to file cabinets getting tapes, mounting and then trying to figure out what to do an MWA Super Science Fellow The NGAS-based MWA data archive solution is working fine during commissioning Fornax The major re-processing power horse and data staging hotspot for MWA One of few data-intensive clusters suited for MWA data reprocessing What s Next à Science archive Data modeling: ObsCore, Data access: VO Tools: ObsTAP, Saada, OpenCADC 23
The Murchison Widefield Array Data Archive System. Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia
The Murchison Widefield Array Data Archive System Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia Agenda Dataflow Requirements Solutions & Lessons learnt Open solution
IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez
IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data
Diagram 1: Islands of storage across a digital broadcast workflow
XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,
In Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,
www.thinkparq.com www.beegfs.com
www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a
CS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
CS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
THE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud
An Oracle White Paper July 2011 Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud Executive Summary... 3 Introduction... 4 Hardware and Software Overview... 5 Compute Node... 5 Storage
Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack
Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack May 2015 Copyright 2015 SwiftStack, Inc. swiftstack.com Page 1 of 19 Table of Contents INTRODUCTION... 3 OpenStack
Data Movement and Storage. Drew Dolgert and previous contributors
Data Movement and Storage Drew Dolgert and previous contributors Data Intensive Computing Location Viewing Manipulation Storage Movement Sharing Interpretation $HOME $WORK $SCRATCH 72 is a Lot, Right?
Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
PARALLELS CLOUD STORAGE
PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...
CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT
SS Data & Storage CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT HEPiX Fall 2012 Workshop October 15-19, 2012 Institute of High Energy Physics, Beijing, China SS Outline
Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University
(WKHUQHW 3& &DUG 0RGHOV '(0993&7 '(09937 '(0993/7
(WKHUQHW 3& &DUG 0RGHOV '(0993&7 '(09937 '(0993/7 8VHU V *XLGH Rev. 08w (October 2004) 9'(099301110136 3ULQWHG LQ 7DLZDQ 5(&
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon [email protected] [email protected] XLDB
Integrating VoltDB with Hadoop
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
POSIX and Object Distributed Storage Systems
1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome
The Google File System
The Google File System By Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Presented at SOSP 2003) Introduction Google search engine. Applications process lots of data. Need good file system. Solution:
Distributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012
Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume
ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS)
ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) Jessica Chapman, Data Workshop March 2013 ASKAP Science Data Archive Talk outline Data flow in brief Some radio
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam [email protected]
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam [email protected] Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A
Redefining Microsoft SQL Server Data Management. PAS Specification
Redefining Microsoft SQL Server Data Management APRIL Actifio 11, 2013 PAS Specification Table of Contents Introduction.... 3 Background.... 3 Virtualizing Microsoft SQL Server Data Management.... 4 Virtualizing
Bigdata High Availability (HA) Architecture
Bigdata High Availability (HA) Architecture Introduction This whitepaper describes an HA architecture based on a shared nothing design. Each node uses commodity hardware and has its own local resources
HDFS Architecture Guide
by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5
BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything
BlueArc unified network storage systems 7th TF-Storage Meeting Scale Bigger, Store Smarter, Accelerate Everything BlueArc s Heritage Private Company, founded in 1998 Headquarters in San Jose, CA Highest
The Hadoop Distributed File System
The Hadoop Distributed File System The Hadoop Distributed File System, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo, 2010 Agenda Topic 1: Introduction Topic 2: Architecture
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
APPENDIX H. CONCEPT DEVELOPMENT
APPENDIX H. CONCEPT DEVELOPMENT +,17(502'$/75$163257$7,21 &(17(5237,216 7KH,7& LOOXVWUDWHG LQ)LJXUHV+WKURXJK+ LV LQWHQGHG WR VHUYH DV WKH SUHPLHU VKRUWWHUPEXVLQHVV WUDYHOHU SDUNLQJ RSWLRQ IRUWKHDLUSRUW7KH,7&DOVRVHUYHVDVWKHDLUSRUW
Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
GraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
Big data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE
DSS Data & Diskpool and cloud storage benchmarks used in IT-DSS CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Geoffray ADDE DSS Outline I- A rational approach to storage systems evaluation
SMB Direct for SQL Server and Private Cloud
SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server
Taking Big Data to the Cloud. Enabling cloud computing & storage for big data applications with on-demand, high-speed transport WHITE PAPER
Taking Big Data to the Cloud WHITE PAPER TABLE OF CONTENTS Introduction 2 The Cloud Promise 3 The Big Data Challenge 3 Aspera Solution 4 Delivering on the Promise 4 HIGHLIGHTS Challenges Transporting large
Quantum StorNext. Product Brief: Distributed LAN Client
Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data David Minor 1, Reagan Moore 2, Bing Zhu, Charles Cowart 4 1. (88)4-104 [email protected] San Diego Supercomputer Center
Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing
Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer [email protected] Assistants: Henri Terho and Antti
Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
Performance and scalability of a large OLTP workload
Performance and scalability of a large OLTP workload ii Performance and scalability of a large OLTP workload Contents Performance and scalability of a large OLTP workload with DB2 9 for System z on Linux..............
2009 Oracle Corporation 1
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,
Client-aware Cloud Storage
Client-aware Cloud Storage Feng Chen Computer Science & Engineering Louisiana State University Michael Mesnier Circuits & Systems Research Intel Labs Scott Hahn Circuits & Systems Research Intel Labs Cloud
XenData Product Brief: SX-550 Series Servers for LTO Archives
XenData Product Brief: SX-550 Series Servers for LTO Archives The SX-550 Series of Archive Servers creates highly scalable LTO Digital Video Archives that are optimized for broadcasters, video production
StorReduce Technical White Paper Cloud-based Data Deduplication
StorReduce Technical White Paper Cloud-based Data Deduplication See also at storreduce.com/docs StorReduce Quick Start Guide StorReduce FAQ StorReduce Solution Brief, and StorReduce Blog at storreduce.com/blog
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? Ashutosh Shinde Performance Architect [email protected] Validating if the workload generated by the load generating tools is applied
Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014
Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet Anand Rangaswamy September 2014 Storage Developer Conference Mellanox Overview Ticker: MLNX Leading provider of high-throughput,
Running a Workflow on a PowerCenter Grid
Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &
Distributed Database Access in the LHC Computing Grid with CORAL
Distributed Database Access in the LHC Computing Grid with CORAL Dirk Duellmann, CERN IT on behalf of the CORAL team (R. Chytracek, D. Duellmann, G. Govi, I. Papadopoulos, Z. Xie) http://pool.cern.ch &
Large File System Backup NERSC Global File System Experience
Large File System Backup NERSC Global File System Experience M. Andrews, J. Hick, W. Kramer, A. Mokhtarani National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory
Maurice Askinazi Ofer Rind Tony Wong. HEPIX @ Cornell Nov. 2, 2010 Storage at BNL
Maurice Askinazi Ofer Rind Tony Wong HEPIX @ Cornell Nov. 2, 2010 Storage at BNL Traditional Storage Dedicated compute nodes and NFS SAN storage Simple and effective, but SAN storage became very expensive
VMware vrealize Automation
VMware vrealize Automation Reference Architecture Version 6.0 and Higher T E C H N I C A L W H I T E P A P E R Table of Contents Overview... 4 What s New... 4 Initial Deployment Recommendations... 4 General
Graylog2 Lennart Koopmann, OSDC 2014. @_lennart / www.graylog2.org
Graylog2 Lennart Koopmann, OSDC 2014 @_lennart / www.graylog2.org About me 25 years old Living in Hamburg, Germany @_lennart on Twitter Co-Founder of TORCH - The Graylog2 company. Graylog2 history Started
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra [email protected] Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359
Managing your Red Hat Enterprise Linux guests with RHN Satellite
Managing your Red Hat Enterprise Linux guests with RHN Satellite Matthew Davis, Level 1 Production Support Manager, Red Hat Brad Hinson, Sr. Support Engineer Lead System z, Red Hat Mark Spencer, Sr. Solutions
This talk is mostly about Data Center Replication, but along the way we'll have to talk about why you'd want transactionality arnd the Low-Level API.
This talk is mostly about Data Center Replication, but along the way we'll have to talk about why you'd want transactionality arnd the Low-Level API. Roughly speaking, the yellow boxes here represenet
How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
Redefining Microsoft Exchange Data Management
Redefining Microsoft Exchange Data Management FEBBRUARY, 2013 Actifio PAS Specification Table of Contents Introduction.... 3 Background.... 3 Virtualizing Microsoft Exchange Data Management.... 3 Virtualizing
High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand
High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand Hari Subramoni *, Ping Lai *, Raj Kettimuthu **, Dhabaleswar. K. (DK) Panda * * Computer Science and Engineering Department
Eloquence Training What s new in Eloquence B.08.00
Eloquence Training What s new in Eloquence B.08.00 2010 Marxmeier Software AG Rev:100727 Overview Released December 2008 Supported until November 2013 Supports 32-bit and 64-bit platforms HP-UX Itanium
(Scale Out NAS System)
For Unlimited Capacity & Performance Clustered NAS System (Scale Out NAS System) Copyright 2010 by Netclips, Ltd. All rights reserved -0- 1 2 3 4 5 NAS Storage Trend Scale-Out NAS Solution Scaleway Advantages
DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group
DSS High performance storage pools for LHC Łukasz Janyst on behalf of the CERN IT-DSS group CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Introduction The goal of EOS is to provide a
Optimising NGAS for the MWA Archive
Noname manuscript No. (will be inserted by the editor) Optimising NGAS for the MWA Archive C Wu A Wicenec D Pallot A Checcucci Received: date / Accepted: date Abstract The Murchison Widefield Array (MWA)
MiaRec. Architecture for SIPREC recording
Architecture for SIPREC recording Table of Contents 1 Overview... 3 2 Architecture... 4 3 Third-party application integration... 6 3.1 REST API... 6 3.2 Direct access to MiaRec resources... 7 4 High availability
Content Distribution Management
Digitizing the Olympics was truly one of the most ambitious media projects in history, and we could not have done it without Signiant. We used Signiant CDM to automate 54 different workflows between 11
Designing a Cloud Storage System
Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes
CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS
CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS Java EE Components Java EE Vendor Specifications Containers Java EE Blueprint Services JDBC Data Sources Java Naming and Directory Interface Java Message
IBM WebSphere Distributed Caching Products
extreme Scale, DataPower XC10 IBM Distributed Caching Products IBM extreme Scale v 7.1 and DataPower XC10 Appliance Highlights A powerful, scalable, elastic inmemory grid for your business-critical applications
Hadoop Distributed File System. Dhruba Borthakur June, 2007
Hadoop Distributed File System Dhruba Borthakur June, 2007 Goals of HDFS Very Large Distributed File System 10K nodes, 100 million files, 10 PB Assumes Commodity Hardware Files are replicated to handle
Performance Analysis of Mixed Distributed Filesystem Workloads
Performance Analysis of Mixed Distributed Filesystem Workloads Esteban Molina-Estolano, Maya Gokhale, Carlos Maltzahn, John May, John Bent, Scott Brandt Motivation Hadoop-tailored filesystems (e.g. CloudStore)
ArcGIS for Server: Administrative Scripting and Automation
ArcGIS for Server: Administrative Scripting and Automation Shreyas Shinde Ranjit Iyer Esri UC 2014 Technical Workshop Agenda Introduction to server administration Command line tools ArcGIS Server Manager
IBM Content Collector Deployment and Performance Tuning
Redpaper Wei-Dong Zhu Markus Lorch IBM Content Collector Deployment and Performance Tuning Overview This IBM Redpaper publication explains the key areas that need to be considered when planning for IBM
Couchbase Server Under the Hood
Couchbase Server Under the Hood An Architectural Overview Couchbase Server is an open-source distributed NoSQL document-oriented database for interactive applications, uniquely suited for those needing
Oracle WebLogic Server 11g Administration
Oracle WebLogic Server 11g Administration This course is designed to provide instruction and hands-on practice in installing and configuring Oracle WebLogic Server 11g. These tasks include starting and
Cray DVS: Data Virtualization Service
Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with
POWER ALL GLOBAL FILE SYSTEM (PGFS)
POWER ALL GLOBAL FILE SYSTEM (PGFS) Defining next generation of global storage grid Power All Networks Ltd. Technical Whitepaper April 2008, version 1.01 Table of Content 1. Introduction.. 3 2. Paradigm
SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013
SAP HANA SAP s In-Memory Database Dr. Martin Kittel, SAP HANA Development January 16, 2013 Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase
Big Data Visualization with JReport
Big Data Visualization with JReport Dean Yao Director of Marketing Greg Harris Systems Engineer Next Generation BI Visualization JReport is an advanced BI visualization platform: Faster, scalable reports,
Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc [email protected]
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc [email protected] What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data
HADOOP PERFORMANCE TUNING
PERFORMANCE TUNING Abstract This paper explains tuning of Hadoop configuration parameters which directly affects Map-Reduce job performance under various conditions, to achieve maximum performance. The
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
Database Monitoring Requirements. Salvatore Di Guida (CERN) On behalf of the CMS DB group
Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group Outline CMS Database infrastructure and data flow. Data access patterns. Requirements coming from the hardware and
Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu
Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects
TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance
TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance M. Rangarajan, A. Bohra, K. Banerjee, E.V. Carrera, R. Bianchini, L. Iftode, W. Zwaenepoel. Presented
VMware vrealize Automation
VMware vrealize Automation Reference Architecture Version 6.0 or Later T E C H N I C A L W H I T E P A P E R J U N E 2 0 1 5 V E R S I O N 1. 5 Table of Contents Overview... 4 What s New... 4 Initial Deployment
Open Text Archive Server and Microsoft Windows Azure Storage
Open Text Archive Server and Microsoft Windows Azure Storage Whitepaper Open Text December 23nd, 2009 2 Microsoft W indows Azure Platform W hite Paper Contents Executive Summary / Introduction... 4 Overview...
GraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH
Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH CONTENTS Introduction... 4 System Components... 4 OpenNebula Cloud Management Toolkit... 4 VMware
BEST PRACTICES FOR INTEGRATING TELESTREAM VANTAGE WITH EMC ISILON ONEFS
Best Practices Guide BEST PRACTICES FOR INTEGRATING TELESTREAM VANTAGE WITH EMC ISILON ONEFS Abstract This best practices guide contains details for integrating Telestream Vantage workflow design and automation
