Optimising NGAS for the MWA Archive
|
|
|
- Alberta Morgan
- 9 years ago
- Views:
Transcription
1 Noname manuscript No. (will be inserted by the editor) Optimising NGAS for the MWA Archive C Wu A Wicenec D Pallot A Checcucci Received: date / Accepted: date Abstract The Murchison Widefield Array (MWA) is a next-generation radio telescope, generating visibility data products continuously at about 400 MB/s. Efficiently managing and archiving this data is a challenge. The MWA Archive consists of dataflows and storage sub-systems distributed across three tiers. At its core is the open source software the Next-Generation Archive System (NGAS) that was initially developed in ESO. However, to meet the MWA data challenge, we have tailored and optimised NGAS to achieve high-throughput data ingestion, efficient data flow management, multi-tiered data storage and processing-aware data staging. Keywords Murchison Widefield Array MWA Archive Data-intensive Computing In-storage Processing 1 Introduction The Murchison Widefield Array (MWA) is optimised for the observation of extremely wide fields at frequencies between MHz [1][16]. The MWA works without any moving parts, beams are formed by electronically adjusting the phase of the individual dipole antenna, 16 of which form a small phased array known as a tile. The complete array of 128 tiles provides about 2000 m 2 of collecting area at 150 MHz. Its instantaneous bandwidth is MHz with a spectral resolution of 40 khz and a time resolution of 0.5 seconds. The field of view of MWA is extraordinary between 200 and 2500 square degrees. C Wu, A. Wicenec, A. Checcucci International Centre for Radio Astronomy Research University of Western Australia [chen.wu, andreas.wicenec, alessio.checcucci]@icrar.org D. Pallot International Centre for Radio Astronomy Research Curtin University [email protected]
2 2 C Wu et al. Fig. 1 The MWA Dataflow consists of three tiers Tier 0 is where data is produced, Tier 1 is where data is archived, and Tier 2 is where data is distributed. The entire MWA dataflow (Figure 1) consists of three tiers. Tier 0, co-located with the telescope at the Murchison Radio-telescope Observatory (MRO), includes Online Processing and Online Archive. Radio-frequency voltage samples collected from each tile (topright corner on Figure 1) are transmitted to the receiver, which streams digitised signals at an aggregated rate of 320 Gbps to Online Processing. Online Processing includes FPGA-enabled Polyphase Filter Bank (PFB) and GPU-enabled software correlators. Correlators output in-memory visibilities that are immediately ingested by the DataCapture system. DataCapture produces memory-resident data files and uses an NGAS client to push files to Online Archive managed by the NGAS server. Each visibility is a complex number representing the amplitude and phase of a signal at a particular point in time within a frequency channel. For a given channel at each time step, the correlator carries out (N (N + 1)/2) (2 2) = 2N(N + 1) pair-wise cross-correlation and auto-correlation [11], where N = 128 is the number of tiles, and 2 2 indicates that polarised samples of orthogonal directions are cross-correlated. Since a visibility is a complex number (2 32 bits = 8 Bytes), the aggregated data rate amounts to 2N(N + 1) 8 1/T M bytes per second, where T is the time resolution (0.5 seconds) and M is the total number of frequency channels: MHz bandwidth / 40 khz spectral resolution = 768. Hence correlators under the full 128-tile configuration produce visibility data at a rate of 387 MB/s. This requires 24 on-site GPU-based software correlators, each processing 32 channels, and producing frequency-split visibilities at a rate of 16 MB/s. Visibilities are currently stored, along with several observation-specific
3 Optimising NGAS for the MWA Archive 3 metdata, as FITS image extensions. In addition to visibilities, it is envisaged that images produced by the real-time imaging pipeline [2] running on those GPU boxes will also be archived online. At Tier 1 (Perth, a city 700 km south of MRO), Long-term Archive (LTA) periodically ingests visibility data stream from Online Archive (OA) via a 10Gbps fibre-optic link (i.e. the shaded arrow from OA to LTA), which is a part of the National Broadband Network (NBN) Australia. The dotted arrow in Figure 1 between OA and LTA represents the transfer of metadata on instruments, observations, and monitor & control information (e.g. telescope temperature). This transfer is implemented as an asynchronous continuous stream supported by the cascading replication streaming environment in Postgresql 9.2. The LTA storage facility the Petabyte Data Store (PBStore) is a combination of magnetic disks and tape libraries provided by ivec, a Western Australian organisation with a mission to foster scientific and technological innovation through the provision of supercomputing and eresearch services. From late July 2013, the LTA facility will be gradually migrated to the Pawsey centre [12], which is a world-class supercomputing centre managed by ivec with a primary aim to host new supercomputing facilities and expertise to support the Square Kilometre Array (SKA) pathfinder research, geosciences and other high-end science. Both data and metadata at LTA will be selectively transferred to a set of Mirrored Archives (MA) at Tier 2. The transfer link between LTA and MAs is optimised by Proxy Archive located at the International Centre for Radio Astronomy Research (ICRAR). An additional copy of the data at LTA can be moved to Offline Processing for compute jobs such as calibration and imaging. ivec provides the Offline Processing facility Fornax, which is a 96-node GPU-based compute cluster located at the University of Western Australia. Tier 2 currently has two Mirrored Archives one is at the Massachusetts Institute of Technology (MIT), USA and the other resides in Victoria University of Wellington (VUW), New Zealand. Data in transit from Tier 1 to Tier 2 is carried over by the Australia Academic and Research Network (AARNET) across the Pacific Ocean. These MAs host a subset of the data products originally available in LTA and provide processing capabilities for local scientists to reduce and analyse data relevant to their research projects. While LTA in Tier 1 periodically pushes data to MAs in Tier 2 in an automated fashion, one can schedule ad-hoc data transfer from LTA to a Tier 2 machine via User Interfaces (UI). Web interfaces and Python APIs are already available for MWA scientists to either synchronously retrieve or asynchronously receive raw visibility data. We are currently developing the interface compliant to several IVOA standards (e.g. SIAP[13], TAP[14]) to access MWA science-ready data products such as images (or image cubes) and catalogues. 2 MWA Archive Requirements and NGAS A set of high level requirements of the MWA archive are summarised as follows: High throughput data ingestion (about 400 MB/s) at the telescope site in Western Australia Efficient distribution of large datasets to multiple geographical locations in three continents
4 4 C Wu et al. Secure and cost-effective storage of 8 Terabytes of data collected daily Fast access to science archive for astronomers across Australia, New Zealand, India, and USA Intensive processing (calibration, imaging, etc.) of archived data on GPU clusters Continuous growth of data volumes (3 Petabytes a year), data varieties (visibility, images, catalogues, etc.), and hardware environment (e.g. from ivec s PBStore to Pawsey supercomputing centre) To fulfil these requirements, we developed the MWA Archive, which is a sustainable software system that captures, stores, and manages data in a scalable fashion. At its core is a Python-based open source software the Next-Generation Archive System (NGAS) that was initially developed in the European Southern Observatory ([3]). The NGAS software is deployed throughout the MWA dataflow, including all archives Online, Proxy, Mirrored, Long-term and Offline Processing as shown in Figure 1. As an open source, NGAS has been previously used in several observatories including the ALMA Archive [4][15], the NRAO evla archive, and the La Silla Paranal Observatory ([3]). Distinct features of NGAS include object-based data storage that spans multiple file systems distributed across multiple sites, scalable architecture, hardware-neutral deployment, loss-free operation and data consistency assurance. For example, NGAS manages file metadata, checking data and possibly replicating data to another storage location as fast as possible in order to secure it against loss. Through its flexible plugin framework, NGAS enables in-storage processing that moves computation to data. While NGAS provides a good starting point to develop the MWA Archive, substantial optimisation efforts are required in order to meet MWA data challenges. First, the data ingestion rate of MWA telescope at MRO (about 400MB/s) is almost an order of magnitude higher than any existing NGAS deployment with ALMA being the highest 66MB/s. It was uncertain whether the existing NGAS architecture scales up to cope with this level of data rate. Second, the MWA archive requires efficient flow control and throughput-oriented data transfer to both LTA and MAs. Given the high data volume, the dataflow must thus saturate the bandwidth provided by the public network, and if possible, optimise data routing. Third, for cost-effective storage, the archive software needs to integrate well with the commercial Hierarchical Storage Management (HSM) system. For example, the MWA archive needs to interact with the HSM for optimal data staging and replacement based on user access patterns. Finally, processing of archived data is essential in MWA to produce science ready data products. Therefore, the MWA archive shall support processing-aware data staging moving data from LTA to Offline Processing (i.e. the Fornax GPU Cluster) in a way to optimise the parallelism between data movement and data processing. In addition, data management for Offline Processing is needed to reduce excessive network I/O contention during data processing. The remainder of this paper discusses the optimisations we have made on NGAS to meet the above four major challenges.
5 Optimising NGAS for the MWA Archive 5 Fig. 2 Dataflows inside DataCapture Blue arrows represent visibility data pushed from one step to the next, dark arrows represent visibilities pulled from the previous step, and red arrows represent the metadata flow. The NGAS Client sends file metadata to the Monitor and Control (M&C) Interface and pushes files to the Online Archive consisting of two local NGAS servers. 3 NGAS Optimisation and Adaptation 3.1 Data ingestion Data ingestion involves fast data capturing and efficient online archiving. Moreover, it requires effective in-memory buffer management and in-process integration with the software correlator. For this purpose, we developed DataCapture, which is a throughput-oriented, multi-threaded C++ program with fault-tolerant stream handing and admission control as shown in Figure 2. DataCapture follows a plugin-based architecture, where DataHandlers are registered with the DataCaptureMgr to transfer and transform data of various formats (e.g. FITS, HDF5, CASA, etc.). DataCaptureMgr manages a memory buffer, where buckets of memory can be dynamically allocated and deallocated by the DataProducer (i.e. the Correlator) and DataHandlers. The jigsaw puzzle pieces in Figure 2 represent the plugin interfaces by which, data handlers can register themselves with DataCaptureMgr. An important design decision was made that data ingestion is largely pull-based (dark arrows in Figure 2) via the StagingArea, where feedback throttling control and file handling (e.g filtering, header augmenting, format-aware file chunking, etc.) can be applied asynchronously without interfering with upstream I/O operations. Online Archive is a cache-mode NGAS optimised for ingesting high-rate data streams. Note that data ingestion is more than just quickly storing data on some
6 6 C Wu et al. Fig. 3 Data ingestion at Online Archive. Each one of the five parallel streams records an ingestion rate over 75 MB/s. The aggregated ingestion rate amounts to 382 MB/s disk arrays but also involves metadata management, online check-sum computation, consistency checking, and scheduling data transfer via the Wide Area Network (WAN) to LTA as fast as possible in order to (1) secure it against loss, and (2) vacate more online disks space for buffering. We added a direct triggering mechanism in NGAS so that files can be explicitly scheduled to be removed upon successful delivery while respecting existing cache purging rules (e.g. capacity, age, etc.). The hardware of Online Archive includes two Supermicro nodes, each with 4 quad-core 2.40 GHz Intel Xeon CPUs, 64GB of memory, and a 10Gb Ethernet network host adapter. The storage attached to each node contains 24 Serial-Attached SCSI (SAS) hard disk drives configured as RAID5, providing a total capacity of 80 TB. Figure 3 shows the data ingestion rate recorded at Online Archive during the early stage of 128-tile commissioning, when five correlator (idataplex dx360 GPU) nodes are being tested, producing the full data rate (each handles some 153 channels). Each one of the five lines represents a data stream being ingested along the time X-axis (second). The Y-axis measures the sustained ingestion data rate (MB/s) achieved for each data stream. The fact that all five lines are roughly aligned on both ends suggests that ingestion streams are parallel, yielding an aggregated ingestion rate of around = 382 MB/s. It also shows the variance of data ingestion for a single stream is rather small about less than 1 MB/s. Note that this result was achieved on a single NGAS server at Online Archive, which has two NGAS servers. This means each NGAS server only needs to share half of the total ingestion load. It is also worth noting that the recorded mean
7 Optimising NGAS for the MWA Archive 7 ingestion rate 76.5 MB/s is not the limit imposed by NGAS, but it corresponds to the maximum rate at which each one of the five correlators can possibly produce data about 387/5 = 77.4 MB/s. A careful examination of Figure 3 reveals that each line consists of three consecutive sections. The first section T p records the time spent on I/O preparation checking data headers and selecting the appropriate storage medium. The second section T m, namely the largest section, records the time spent on disk writes and online Cyclic Redundancy Check (CRC) computation. The third section T l records the time spent on post-processing including computing the final destination, quality check, moving data from the staging area to the final destination, metadata registration, and triggering the file delivery event. For all five lines, T p is almost negligible. Compared to T m, T l is relatively trivial in all five cases, indicating that the post-processing overhead is also very small. This result shows that our data ingestion is optimal sequential access T m dominates the I/O. 3.2 Dataflow management The NGAS software uses a subscription-based mechanism to facilitate dataflows, synchronising data across multiple sites. A data provider has multiple subscribers, each subscribing to data products that will satisfy a subscription condition. Whenever a subscription event is triggered (for example, a new file is ingested into the data provider), a subscriber will receive all data that meet its subscription condition. A subscription condition can be as simple as a timestamp (e.g. data ingested since last Friday ). It can also contain complex rules written in an NGAS subscription filter plugin. While NGAS subscription provides a simple and effective method to control the dataflow, it lacks a dedicated data transfer service for bandwidth saturation, network optimisation, etc., which are essential for managing sustained dataflows that move multi-terabytes of data each day across three continents. We therefore made several changes to the existing subscription mechanism. First, a single dataflow now supports multiple concurrent data streams, multiplexing the same TCP link to maximise the AARNET bandwidth utilisation. This has significantly improved the cross-pacific data transfer rate measured from 20 Mbps (a single stream) to 160 Mbps (12 streams), and to 424 Mbps (32 streams), saturating almost half of the maximal bandwidth provisioned by the 1Gbps AAR- NET. Considering AARNET is a public WAN shared by many users, this transfer rate is reasonably good. In addition to an improvement in the transfer rate, the transfer service is also configurable. For example, the number of streams is adjustable during runtime. Our current research aims to dynamically control this number based on real-time information on network usage. We have also started the integration of the GridFTP library into the NGAS data transfer module for advanced features such as concurrent transfer of file splits and file pipelining. The data subscribers can be any network endpoint that supports three protocols HTTP, FTP, and GridFTP. Second, we developed a new NGAS command PARCHIVE that enables an NGAS server to act as a Proxy Archive, which, as shown in Figure 1, directly relays a data stream from one network to the next. It works as follows: a client wishing to transfer a file to destination D sends the file data via an HTTP POST request (whose URL points to the PARCHIVE command) to the Proxy Archive.
8 8 C Wu et al. The Proxy Archive simply reads incoming data blocks into a memory buffer and sends outgoing data blocks from the buffer to D s URL, which is a parameter specified in the header of the original HTTP POST request. This is useful when the existing network environment cannot meet the dataflow requirement. For example, the MWA LTA is physically located at PBStore, which provides a storage front-end node called Cortex that is shared by many ivec users moving lots of data from/to PBStore every day. While ivec has 10Gb bandwidth for all internal network nodes, Cortex could become the potential bottleneck between ivec network and the public network. We thus setup a Proxy Archive at ICRAR, which sits on the same 10Gb ivec internal network, but has few shared users moving data to/from the outside world. As a result, MWA data streams bypass Cortex and uses an network endpoint exclusive to MWA to reach remote destinations. Although this relay mechanism can be easily built into the network layer using true proxy servers, placing it in the application layer has an advantage: By writing plugins for the PARCHIVE command, we can perform in-transit processing, transforming (e.g. compressing, sub-setting, aggregating, etc.) data on the move. This processing inside network approach not only capitalises on the inherent cost of data movement, but potentially reduces storage capacity requirements since no persistent storage is required for in-transit processing. Third, we created a new NGAS running mode Data Mover. An NGAS node running in the Data Mover mode is a stripped-down version of a regular NGAS server optimised for dataflow management. A Data Mover is able to perform all dataflow-related tasks (such as data transfer, subscription management, etc.) on behalf of a set A normal of normal-mode NGAS servers as long as it has read access to their archived files on storage media. This is achieved by (1) disabling all NGAS commands (e.g. ARCHIVE) that write data into NGAS, which leaves a Data Mover completely read only (2) instructing the Data Mover to periodically detect changes in NGAS file metadata associated with A normal, and (3) applying subscription conditions to the Data Mover against data archived by A normal. In this way, changes in file metadata that meet some subscription conditions will trigger a Data Mover to transfer those affected files from current storage media managed by A normal to external/remote subscribers. This not only removes the burden of dataflow management from A normal, but reduces the coupling between dataflow management and data archiving, separating them in two different processes. An immediate benefit is that, as both archiving and dataflow become increasingly complex, one can change (even shut down) one service without interfering with another, a key requirement for building a scalable and sustainable MWA Archive. Furthermore, we can deploy Data Movers as a secure, read-only data management layer that shields internal NGAS nodes from being directly contacted by end users. 3.3 Data storage and access The overall architecture of the MWA Long-term Archive is shown in Figure 4. Data on PBStore is managed by the Hierarchical Storage Management (HSM) system Oracle SAM-FS, which operates on three levels of storage media: Cortex RAM, disk arrays, and the tape library. Cortex is the front-end machine that has access to data held at PBStore via the POSIX-compliant file system interface provided
9 Optimising NGAS for the MWA Archive 9 Fig. 4 Data storage and access at the Long-term Archive by SAM-FS. The Online Archive at the telescope site streams data to NGAS archive servers running on Cortex via WAN (10Gb). NGAS servers write this data onto the disk storage using the standard POSIX interface. Throughout the data life cycle, HSM issues data migration requests (archive, stage, and release), moving data across the storage hierarchy. Such movement is either implicitly triggered by POSIX operations (e.g. fread()) or explicitly scheduled based on data migration policies. While the HSM system automates data migration and abstracts data movement away from application programs, it also provides API libraries that allow applications to directly manipulate data movement between disks and tape libraries. This enables NGAS to optimise data replacement and data staging on LTA. Server-directed I/O When applications access data files using POSIX interfaces (fread, fseek, etc.), a missed hit on disk cache will trigger HSM to schedule a stage request that moves data from tapes to disks. This in turn causes HSM to block the current thread for a period time whose duration is non-deterministic in a shared environment like PBStore. In the case of MWA LTA, this might lead to an HTTP timeout or even halt other NGAS threads since not all POSIX operations will gracefully release system resources during I/O wait. To tackle this issue, we added a new NGAS command for asynchronous data access. In the new scheme, an astronomer lists files they wish to access (e.g. all files related to some observations or within a piece of sky) and a URL that can receive these files (e.g. ftp://mydomain/fileserver). After verifying the location of each file, NGAS will simultaneously launch two threads a delivery thread that delivers all disk-resident files in the list to this URL, and a staging thread that stages all tape-resident files by explicitly issuing stage requests via the HSM API. The staging thread monitors the staging progress and notifies the delivery thread of any new disk-resident files that have become deliverable.
10 10 C Wu et al Cumulative Probability Project A Project B Access intervals (days) Fig. 5 The DAPA service produces the empirical distribution of the number of days between inter-accesses recorded at Long-term Archive for two science projects during a 140-day period. Since NGAS now explicitly directs the staging process before performing any I/O tasks on tape-resident files, there are no forced I/O blocking that can affect other NGAS working threads. Moreover, such server-directed I/O [7] leaves much more room for optimisation. For example, we let the HSM system reorder the files in the staging list such that files close to one another on the same tape volume will be staged together, reducing many random seeks into fewer sequential reads. This reduction is also possible for different users whose file access lists have a common subset. In addition to server-directed staging, we are investigating server-directed archiving using the declustering technique files within a contiguous (temporal, spatial or logical) region will be evenly distributed on multiple tape volumes to enable fast parallel staging. Data access pattern analysis The analysis of data access patterns becomes essential when the number of missed file hits becomes increasingly large on the HSM disk cache. Although HSM data migration policies based on certain criteria (e.g. high/low watermark, file age/size/type, modified-since-timestamp) can be useful in general, they have several drawbacks. First, policies more or less reflect assumptions made by users based on past experience, which could lead to inaccurate assessment on how data is actually being used. An inaccurate parameter in the migration policy could lead to constant system churn, in which, for example, files that have just been released from disks are asked to be staged back from the tape in the next minute. Second, data access patterns evolve overtime, and they evolve differently for different types of data and different user groups. Migration policies need to be regularly revisited to reflect such changes, which should be carried out
11 Optimising NGAS for the MWA Archive 11 in an automated fashion. Last, it is also of a question whether the expressiveness of a predefined migration policy schema can accommodate migration rules based on the content of the data (e.g. FITS header key words) rather than their metadata (such as file size, age, etc.) We therefore developed the Data Access Pattern Analysis (DAPA) service base on the existing NGAS logging facility. This service periodically collects raw log files generated by NGAS servers at remote sites, parses the textual description, and converts them into structured records stored in the relational database. DAPA is currently provisioned as a Web service that accepts REST requests to plot various data access statistics. Figure 5 shows the empirical distribution of the length of access intervals for two science projects in a 140-day period during MWA commissioning. The X-axis represents the number of days between two consecutive accesses for a file. The two projects appear considerably different, which suggests that content-specific data migration policies can be more optimal. For instance, when the HSM disk cache is running low, we can temporarily release from disks those project B files that have not been accessed for a month. On the other hand, we could pre-stage to disks those tape-resident project A files that have not been accessed for two months given the sharp increase in inter-access probability between 60 days to 90 days. Note that we cannot predict precisely how data will be accessed in the future since the true process governing astronomers to access certain data at a given time is overly complex and hence cannot be modelled as such. Our approach assumes that, when more scientists access the same data archive for an extended period of time, the overall process can be considered as random and hence can be simplified using probability models. The development of DAPA service is a proofof-concept of this approach and is still at an early stage. In the next step, we will fit various probability models based on these empirical distributions. Ultimately, these fitted models and other useful principles can establish objective functions (e.g. minimise disk cache usage, expected access latency, miss rate, etc.), re-casting data migration as an optimisation problem. 3.4 Data staging & processing This section discusses data management for running MWA data processing pipelines on a GPU cluster - Fornax [8]. Fornax is a data-intensive SGI supercomputer that comprises 96 compute nodes, each containing two 6-core Intel Xeon X5650 CPUs with 72GB RAM and a NVIDIA Tesla C2075 GPU with 6 GB RAM. To the MWA Archive, the distinct data-intensive feature of Fornax are: (1) In addition to the traditional global storage (500 TB managed by the Lustre filesystem), each compute node has a directly-attached, RAID-0 disk array with a capacity of 7 TB, which amounts to a total of 672 TB local storage. (2) two InfiniBand networks (40Gbps) with the first one used for the Message Passing Interface (MPI) interconnect and the global storage network and the second dedicated to data movement across local disks on neighbouring compute nodes, (3) two Copy Nodes each has a dedicated 10Gb network link to the outside world. Two important goals guided our design: (1) to create overlaps between data staging and data processing, and (2) to utilise the 7 TB local storage attached to each compute node. The first goal aims to achieve a higher level of parallelism
12 12 C Wu et al. Fig. 6 Data staging and processing at Fornax Cluster, which includes 96 Compute nodes, 2 Copy nodes, 6 Lustre storage nodes, 2 Lustre metadata server nodes, and so on. Each compute node has 7 TB of local storage. Both NGAS code and the Data processing pipeline are kept on the storage nodes and are accessible and executable from within a compute node via a shared file system Lustre. Initially, NGAS servers are launched by the cluster job scheduler on compute nodes, which are dynamically allocated for processing NGAS jobs. Next, scientists (bottom right) issue data processing requests to the JobMAN Web service via browsers or Python libraries. JobMAN analyses the request, queries related NGAS servers on compute nodes. For data only available at LTA, JobMAN sends a data staging request to LTA NGAS servers. LTA arranges data migration from tape libraries, and streams data to the NGAS server running on a Copy node, which uses the PARCHIVE command to forward the data stream to a compute node where it will be processed by an NGAS processing plugin. For data already kept on Fornax, JobMAN schedules data movement directly from one NGAS server to another where processing will take place. Once all data required by a task in a processing pipeline is available on a compute node, NGAS server executes (gears with green arrows) that task after loading the executable into the compute node and returns the results back to JobMAN for scientists to retrieve. between I/O and computation. Nevertheless this is not straightforward on most of today s supercomputing clusters (including Fornax), where scientists often have to launch two separate jobs one for data staging, and the other for processing. The processing job typically waits for the staging job to finish copying all required data from remote archives to the cluster. Introducing overlaps between these two jobs allows some processing to be done while data is being staged, thus reducing the total completion time by hiding I/O behind computation. The second goal aims to achieve satisfactory I/O performance with a low variability during data staging. Unlike traditional compute-intensive clusters where all user data is centrally stored and managed by a global storage network consisting of several dedicated storage nodes, data-intensive clusters (e.g. Fornax and DataScope [17]) provide compute-attached disk arrays. Since writing data to local disks
13 Optimising NGAS for the MWA Archive 13 does not involve any network I/O which is somewhat volatile considering many applications/users are sharing the cluster resource at a given time, applications using local storage can thus achieve good I/O performance with a smaller variance. In fact, the aggregated I/O throughput across all 96 compute nodes is in the order of 40 GB/s, far beyond what a centralised shared file system can normally reach. Furthermore, as local disk arrays are evenly distributed across the entire cluster, they have a better load balancing, scalability, and are less susceptible to a single-point-failure. However, traditional cluster job schedulers rely solely on a shared file system to manage the global storage, but lack the capability to provide applications with access to node-attached local disks. As a result, these high performance local storage resources on Fornax are often under-utilised. Figure 6 depicts the overall solution to achieve the above two goals. At its core is the NGAS Job Manager (JobMAN ), a Web service that schedules job executions to NGAS servers on Fornax compute nodes. To interleave processing with data staging, JobMAN decomposes a processing request into small tasks that can be independently executed in parallel. Such decomposition is domain specific. In the case of MWA imaging/calibration, visibility data in each observation is split into a number of sub-bands, each containing a fixed number of frequency channels. Data in different sub-bands are staged and processed in different tasks on given compute nodes. To select an ideal compute node for processing a task, JobMAN queries the Fornax NGAS database to obtain data locality information on local storage, and triggers necessary data movement (e.g. from LTA to a compute node, or from one compute node to another) via the NGAS subscription and asynchronous retrieval features discussed in Section 3.2. To utilise the local storage, we essentially used a two-level (cluster job scheduler and JobMAN scheduler) scheduling technique [6]. In the first level, NGAS servers acquire compute node resources through the normal cluster job scheduler on Fornax PBS (Portable Batch System) Pro. Once resources are acquired, in the second level, JobMAN is in charge of matching tasks with correct data on some compute nodes, and moves data if necessary using NGAS dataflow management system (Section 3.2). In this way, each task is paired with required data at its local disks that cannot be utilised by a shared file system or traditional job schedulers (e.g. PBS Pro). In summary, NGAS JobMAN allows for parallel executions of staging and processing, reducing the overall time-to-value for data products. It also reduces the network I/O contention by restricting each task to access local storage resource. We are currently working on numerous interesting topics in this line of work including the avoidance of hotspots during NGAS job scheduling that requires partial remote I/O access, optimal staging strategies [5], and dealing with iterative workflows with data-dependency and intermediate data products. 4 Conclusion As a precursor project to the SKA-Low [9], the MWA is a next-generation radio telescope that generates large volumes of data, opening exciting new science opportunities [10]. The MWA Archive has stringent requirements on data ingestion, transferring, storage & access, staging and processing. NGAS is a feature-rich, flexible and effective open source archive management software that has been used for
14 14 C Wu et al. several large telescopes in both the optical [3] and radio [15] astronomy community. However, to tackle the MWA data challenge, we need to tailor and optimise NGAS to meet all these requirements. In this paper, we discussed these optimisations and evaluated methods we have used to achieve high-throughput data ingestion, efficient data flow management, multi-tiered data storage and optimal processing-aware staging. Thus far, the MWA Archive has met the requirements during the commissioning phase. For example, Online Archive is able to cope with the full ingestion data rate. The MWA dataflow has synchronised data across three tiers and saturated available network bandwidth to maximise the data distribution efficiency. The turnaround time from observing an area of the sky to data products appearing in the Tier-2 hosts can be reduced to minutes. This is achieved by moving about 3 4 TB of data from Western Australia to MIT, USA every day. The MWA LTA is currently being relocated to the new ivec Pawsey supercomputing centre [12] for the full operation phase. The entire MWA Archive software environment is rather generic the Pythonbased NGAS software currently runs on both Linux-based (CentOS, SUSE Linux Enterprise, and Ubuntu Server) and UNIX-based (Solaris) operating systems. The NGAS software requires several standard third-party software libraries such as python2.7-devel, autoconf, zlib, readline, openssl and so on, all of which are installed during the automated installation process using one of the standard distribution package managers such as RPM, YUM, Zypper or APT. The NGAS software has adopted a highly flexible, plugin-based software architecture where the core does not make any special assumptions about the hardware, operating systems or file systems. As a result, all computing environment-specific optimisations made for the MWA Archive (e.g. querying file location status on tape libraries, interaction with the HSM) are provided as NGAS plugins, which can be dynamically plugged into and removed from the NGAS core based on the run-time configuration. During a period of sixteen months, four authors of this paper collectively contributed one Full-Time Equivalent (FTE) time towards developing the MMA Archive from hardware procurement & configuration, network setup & tuning to software implementation & test, database configuration, and software deployment. The code base of the NGAS core software contains about 40,000 lines of Python code. For the MWA Archive, we have added about 5,500 lines of Python code, most of which in the form of NGAS plugins and 4,000 lines of C++ code to develop the DataCapture sub-system. The NGAS software has a Lesser GPL license and can be obtained upon request. We are currently in the process of forming the NGAS user group, inviting members from around the world who are using NGAS as their archive management software. We envisage that the NGAS user group will release the software to the public domain in the future. 5 Acknowledgment The authors would like to thank ivec for providing the PB Store, Fornax and Pawsey supercomputing facilities. The Pawsey centre is funded from Australian federal and Western Australian state grants. ICRAR is a joint venture between the Curtin University, the University of Western Australia and received grants from the Western Australian Government. The authors would like to acknowledge
15 Optimising NGAS for the MWA Archive 15 the excellent work of the original main developer of NGAS, Jens Knudstrup from ESO, without which this work would have been impossible. The authors would also like to thank two anonymous reviewers for providing valuable feedback to the original manuscript. References 1. C.J. Lonsdale, R.J. Cappallo, M.F. Morales, F.H. Briggs, L. Benkevitch, J.D. Bowman, J.D. Bunton, S. Burns, B.E. Corey, L. desouza, et al., Proceedings of the IEEE 97(8), 1497 (2009) 2. S. Ord, D. Mitchell, R. Wayth, L. Greenhill, G. Bernardi, S. Gleadow, R. Edgar, M. Clark, G. Allen, W. Arcus, et al., Publications of the Astronomical Society of the Pacific 122(897), 1353 (2010) 3. A. Wicenec, J. Knudstrup, The Messenger 129, 27 (2007) 4. A. Wicenec, S. Farrow, S. Gaudet, N. Hill, H. Meuss, A. Stirling, in Astronomical Data Analysis Software and Systems (ADASS) XIII, vol. 314 (2004), vol. 314, p S. Bharathi, A. Chervenak, in Proceedings of the second international workshop on Dataaware distributed computing (ACM, 2009), p I. Raicu, Y. Zhao, I.T. Foster, A. Szalay, in Proceedings of the 2008 international workshop on Data-aware distributed computing - DADC 08 (ACM Press, New York, New York, USA, 2008), pp DOI / R.A. Oldfield, L. Ward, R. Riesen, A.B. Maccabe, P. Widener, T. Kordenbrock, in Cluster Computing, 2006 IEEE International Conference on (IEEE, 2006), pp C. Harris, in Proceedings of the ATIP/A* CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way? (A* STAR Computational Resource Centre, 2012), p SKA, in timeallocation-on-ska-precursor-the-mwa-2/ (SKAtelescope.org, 2013) 10. J.D. Bowman, I. Cairns, D.L. Kaplan, T. Murphy, D. Oberoi, L.Staveley-Smith, W.Arcus, D.G. Barnes, G. Bernardi, F.H. Briggs, et al., Publications of the Astronomical Society of Australia 10, 28 (2013) 11. R. B. Wayth, L. J. Greenhill, and F. H. Briggs, Publications of the Astronomical Society of the Pacific 121, 882 (2009) 12. HPCWire, in providing data management infrastructure at pawsey supercomputing centre.html (hpcwire.com, 2013) 13. D. Tody, R.Plante, in (International Virtual Observatory Alliance, 2009) 14. P. Dowler, G. Rixon, D. Tody, in (International Virtual Observatory Alliance, 2010) 15. A. Manning, A. Wicenec, A. Checcucci, J.A. Gonzalez Villalba, in Astronomical Data Analysis Software and Systems XXI, Astronomical Society of the Pacific Conference Series, vol. 461, ed. by P. Ballester, D. Egret, N.P.F. Lorente (2012), Astronomical Society of the Pacific Conference Series, vol. 461, p S.J. Tingay, R. Goeke, J.D. Bowman, D. Emrich, S.M. Ord, D.A. Mitchell, M.F. Morales, T. Booler, B. Crosse, R.B. Wayth, C.J. Lonsdale, S. Tremblay, D. Pallot, T. Colegate, A. Wicenec, N. Kudryavtseva, W. Arcus, D. Barnes, G. Bernardi, F. Briggs, S. Burns, J.D. Bunton, R.J. Cappallo, B.E. Corey, A. Deshpande, L. Desouza, B.M. Gaensler, L.J. Greenhill, P.J. Hall, B.J. Hazelton, D. Herne, J.N. Hewitt, M. Johnston-Hollitt, D.L. Ka- plan, J.C. Kasper, B.B. Kincaid, R. Koenig, E. Kratzenberg, M.J. Lynch, B. Mckinley, S.R. Mcwhirter, E. Morgan, D. Oberoi, J. Pathikulangara, T. Prabu, R.A. Remillard, A.E.E. Rogers, A. Roshi, J.E. Salah, R.J. Sault, N. Udaya-Shankar, F. Schlagenhaufer, K.S. Srivani, J. Stevens, R. Subrahmanyan, M. Waterson, R.L. Webster, A.R. Whitney, A. Williams, C.L. Williams, J.S.B. Wyithe, Publications of the Astronomical Society of Australia 30, e007 (2013). DOI /pasa E. Givelberg, A. Szalay, K. Kanov, R. Burns, in Proceedings of the First International workshop on Network-aware data management (ACM, New York, NY, USA, 2011), NDM 11, pp DOI / URL
The Murchison Widefield Array Data Archive System. Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia
The Murchison Widefield Array Data Archive System Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia Agenda Dataflow Requirements Solutions & Lessons learnt Open solution
MWA Archive A multi-tiered dataflow and storage system based on NGAS
MWA Archive A multi-tiered dataflow and storage system based on NGAS Chen Wu Team: Dave Pallot, Andreas Wicenec, Chen Wu Agenda Dataflow overview High-level requirements Next Generation Archive System
arxiv:1212.1327v1 [astro-ph.im] 6 Dec 2012
arxiv:1212.1327v1 [astro-ph.im] 6 Dec 2012 Realisation of a low frequency SKA Precursor: The Murchison Widefield Array International Centre for Radio Astronomy Research - Curtin University, Perth, Australia
PARALLELS CLOUD STORAGE
PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...
Benchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
Diagram 1: Islands of storage across a digital broadcast workflow
XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,
Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software
Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software The Video Edition of XenData Archive Series software manages one or more automated data tape libraries on
ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS)
ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) Jessica Chapman, Data Workshop March 2013 ASKAP Science Data Archive Talk outline Data flow in brief Some radio
Implementing a Digital Video Archive Using XenData Software and a Spectra Logic Archive
Using XenData Software and a Spectra Logic Archive With the Video Edition of XenData Archive Series software on a Windows server and a Spectra Logic T-Series digital archive, broadcast organizations have
Implementing a Digital Video Archive Based on XenData Software
Based on XenData Software The Video Edition of XenData Archive Series software manages a digital tape library on a Windows Server 2003 platform to create a digital video archive that is ideal for the demanding
IBM Tivoli Storage Manager Version 7.1.4. Introduction to Data Protection Solutions IBM
IBM Tivoli Storage Manager Version 7.1.4 Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.4 Introduction to Data Protection Solutions IBM Note: Before you use this
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University
Using Synology SSD Technology to Enhance System Performance Synology Inc.
Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...
EMC ISILON AND ELEMENTAL SERVER
Configuration Guide EMC ISILON AND ELEMENTAL SERVER Configuration Guide for EMC Isilon Scale-Out NAS and Elemental Server v1.9 EMC Solutions Group Abstract EMC Isilon and Elemental provide best-in-class,
Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
Content Distribution Management
Digitizing the Olympics was truly one of the most ambitious media projects in history, and we could not have done it without Signiant. We used Signiant CDM to automate 54 different workflows between 11
HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect [email protected]
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect [email protected] EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
The functionality and advantages of a high-availability file server system
The functionality and advantages of a high-availability file server system This paper discusses the benefits of deploying a JMR SHARE High-Availability File Server System. Hardware and performance considerations
StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data
: High-throughput and Scalable Storage Technology for Streaming Data Munenori Maeda Toshihiro Ozawa Real-time analytical processing (RTAP) of vast amounts of time-series data from sensors, server logs,
Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra
Using Synology SSD Technology to Enhance System Performance Synology Inc.
Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD
Operating System for the K computer
Operating System for the K computer Jun Moroo Masahiko Yamada Takeharu Kato For the K computer to achieve the world s highest performance, Fujitsu has worked on the following three performance improvements
Network device management solution
iw Management Console Network device management solution iw MANAGEMENT CONSOLE Scalability. Reliability. Real-time communications. Productivity. Network efficiency. You demand it from your ERP systems
Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1
Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
LSI MegaRAID FastPath Performance Evaluation in a Web Server Environment
LSI MegaRAID FastPath Performance Evaluation in a Web Server Environment Evaluation report prepared under contract with LSI Corporation Introduction Interest in solid-state storage (SSS) is high, and IT
Cray: Enabling Real-Time Discovery in Big Data
Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects
Data Validation and Data Management Solutions
FRONTIER TECHNOLOGY, INC. Advanced Technology for Superior Solutions. and Solutions Abstract Within the performance evaluation and calibration communities, test programs are driven by requirements, test
Avid ISIS 7000. www.avid.com
Avid ISIS 7000 www.avid.com Table of Contents Overview... 3 Avid ISIS Technology Overview... 6 ISIS Storage Blade... 6 ISIS Switch Blade... 7 ISIS System Director... 7 ISIS Client Software... 8 ISIS Redundant
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006
OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006 EXECUTIVE SUMMARY Microsoft Exchange Server is a disk-intensive application that requires high speed storage to deliver
Improving Grid Processing Efficiency through Compute-Data Confluence
Solution Brief GemFire* Symphony* Intel Xeon processor Improving Grid Processing Efficiency through Compute-Data Confluence A benchmark report featuring GemStone Systems, Intel Corporation and Platform
Big Data and Cloud Computing for GHRSST
Big Data and Cloud Computing for GHRSST Jean-Francois Piollé ([email protected]) Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer Facing data deluge
Distributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
JoramMQ, a distributed MQTT broker for the Internet of Things
JoramMQ, a distributed broker for the Internet of Things White paper and performance evaluation v1.2 September 214 mqtt.jorammq.com www.scalagent.com 1 1 Overview Message Queue Telemetry Transport () is
Introduction to Gluster. Versions 3.0.x
Introduction to Gluster Versions 3.0.x Table of Contents Table of Contents... 2 Overview... 3 Gluster File System... 3 Gluster Storage Platform... 3 No metadata with the Elastic Hash Algorithm... 4 A Gluster
Overlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm [email protected] [email protected] Department of Computer Science Department of Electrical and Computer
Integrated Grid Solutions. and Greenplum
EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving
WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression
WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression Sponsored by: Oracle Steven Scully May 2010 Benjamin Woo IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
White Paper November 2015. Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses
White Paper November 2015 Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses Our Evolutionary Approach to Integration With the proliferation of SaaS adoption, a gap
Technical White Paper Integration of ETERNUS DX Storage Systems in VMware Environments
White Paper Integration of ETERNUS DX Storage Systems in ware Environments Technical White Paper Integration of ETERNUS DX Storage Systems in ware Environments Content The role of storage in virtual server
owncloud Enterprise Edition on IBM Infrastructure
owncloud Enterprise Edition on IBM Infrastructure A Performance and Sizing Study for Large User Number Scenarios Dr. Oliver Oberst IBM Frank Karlitschek owncloud Page 1 of 10 Introduction One aspect of
THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid
THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid Contents 2 The ARCOS Group. Expand motivation. Expand
MAGENTO HOSTING Progressive Server Performance Improvements
MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 [email protected] 1.866.963.0424 www.simplehelix.com 2 Table of Contents
- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
Module 5. Broadcast Communication Networks. Version 2 CSE IIT, Kharagpur
Module 5 Broadcast Communication Networks Lesson 1 Network Topology Specific Instructional Objectives At the end of this lesson, the students will be able to: Specify what is meant by network topology
Oracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
Data Integrator Performance Optimization Guide
Data Integrator Performance Optimization Guide Data Integrator 11.7.2 for Windows and UNIX Patents Trademarks Copyright Third-party contributors Business Objects owns the following
Designing a Cloud Storage System
Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes
Cisco Quantum Policy Suite for BNG
Data Sheet Cisco Quantum Policy Suite for BNG Solution Overview The Cisco Quantum Policy Suite is a carrier-grade policy, charging, and subscriber data management software solution that enables service
Virtualizing SQL Server 2008 Using EMC VNX Series and Microsoft Windows Server 2008 R2 Hyper-V. Reference Architecture
Virtualizing SQL Server 2008 Using EMC VNX Series and Microsoft Windows Server 2008 R2 Hyper-V Copyright 2011 EMC Corporation. All rights reserved. Published February, 2011 EMC believes the information
Oracle Applications Release 10.7 NCA Network Performance for the Enterprise. An Oracle White Paper January 1998
Oracle Applications Release 10.7 NCA Network Performance for the Enterprise An Oracle White Paper January 1998 INTRODUCTION Oracle has quickly integrated web technologies into business applications, becoming
www.thinkparq.com www.beegfs.com
www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based
XenData Product Brief: SX-550 Series Servers for LTO Archives
XenData Product Brief: SX-550 Series Servers for LTO Archives The SX-550 Series of Archive Servers creates highly scalable LTO Digital Video Archives that are optimized for broadcasters, video production
Quiz for Chapter 6 Storage and Other I/O Topics 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [6 points] Give a concise answer to each
Oracle9i Release 2 Database Architecture on Windows. An Oracle Technical White Paper April 2003
Oracle9i Release 2 Database Architecture on Windows An Oracle Technical White Paper April 2003 Oracle9i Release 2 Database Architecture on Windows Executive Overview... 3 Introduction... 3 Oracle9i Release
Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2
Using Synology SSD Technology to Enhance System Performance Based on DSM 5.2 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD Cache as Solution...
High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
Scalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB
Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Executive Summary Oracle Berkeley DB is used in a wide variety of carrier-grade mobile infrastructure systems. Berkeley DB provides
IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE
White Paper IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE Abstract This white paper focuses on recovery of an IBM Tivoli Storage Manager (TSM) server and explores
An Oracle White Paper July 2011. Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide
Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide An Oracle White Paper July 2011 1 Disclaimer The following is intended to outline our general product direction.
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &
New Design and Layout Tips For Processing Multiple Tasks
Novel, Highly-Parallel Software for the Online Storage System of the ATLAS Experiment at CERN: Design and Performances Tommaso Colombo a,b Wainer Vandelli b a Università degli Studi di Pavia b CERN IEEE
Protect Microsoft Exchange databases, achieve long-term data retention
Technical white paper Protect Microsoft Exchange databases, achieve long-term data retention HP StoreOnce Backup systems, HP StoreOnce Catalyst, and Symantec NetBackup OpenStorage Table of contents Introduction...
IBM Storwize V5000. Designed to drive innovation and greater flexibility with a hybrid storage solution. Highlights. IBM Systems Data Sheet
IBM Storwize V5000 Designed to drive innovation and greater flexibility with a hybrid storage solution Highlights Customize your storage system with flexible software and hardware options Boost performance
DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE
DSS Data & Diskpool and cloud storage benchmarks used in IT-DSS CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Geoffray ADDE DSS Outline I- A rational approach to storage systems evaluation
Network device management solution.
Network device management solution. iw Management Console Version 3 you can Scalability. Reliability. Real-time communications. Productivity. Network efficiency. You demand it from your ERP systems and
Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.
Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez
IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data
Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration
Solutions Integrated Storage Appliances Management Optimized Storage & Migration Archive Data Retention & Compliance Services Global Installation & Support SECURING THE FUTURE OF YOUR DATA w w w.q sta
D1.2 Network Load Balancing
D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June [email protected],[email protected],
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
Post-production Video Editing Solution Guide with Quantum StorNext File System AssuredSAN 4000
Post-production Video Editing Solution Guide with Quantum StorNext File System AssuredSAN 4000 Dot Hill Systems introduction 1 INTRODUCTION Dot Hill Systems offers high performance network storage products
Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.
Reference Architecture Designing High-Performance Storage Tiers Designing High-Performance Storage Tiers Intel Enterprise Edition for Lustre* software and Intel Non-Volatile Memory Express (NVMe) Storage
Comparison of Cloud vs. Tape Backup Performance and Costs with Oracle Database
JIOS, VOL. 35, NO. 1 (2011) SUBMITTED 02/11; ACCEPTED 06/11 UDC 004.75 Comparison of Cloud vs. Tape Backup Performance and Costs with Oracle Database University of Ljubljana Faculty of Computer and Information
Hitachi NAS Platform and Hitachi Content Platform with ESRI Image
W H I T E P A P E R Hitachi NAS Platform and Hitachi Content Platform with ESRI Image Aciduisismodo Extension to ArcGIS Dolore Server Eolore for Dionseq Geographic Uatummy Information Odolorem Systems
Availability Digest. www.availabilitydigest.com. Raima s High-Availability Embedded Database December 2011
the Availability Digest Raima s High-Availability Embedded Database December 2011 Embedded processing systems are everywhere. You probably cannot go a day without interacting with dozens of these powerful
Big data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
Windows Server 2008 R2 Hyper-V Live Migration
Windows Server 2008 R2 Hyper-V Live Migration White Paper Published: August 09 This is a preliminary document and may be changed substantially prior to final commercial release of the software described
High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand
High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand Hari Subramoni *, Ping Lai *, Raj Kettimuthu **, Dhabaleswar. K. (DK) Panda * * Computer Science and Engineering Department
Capacity Planning Guide for Adobe LiveCycle Data Services 2.6
White Paper Capacity Planning Guide for Adobe LiveCycle Data Services 2.6 Create applications that can deliver thousands of messages per second to thousands of end users simultaneously Table of contents
Network Attached Storage. Jinfeng Yang Oct/19/2015
Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability
IBM System Storage SAN Volume Controller
SAN Volume Controller Simplified and centralized management for your storage infrastructure Highlights Enhance storage capabilities with sophisticated virtualization, management and functionality Move
Network Instruments white paper
Network Instruments white paper ANALYZING FULL-DUPLEX NETWORKS There are a number ways to access full-duplex traffic on a network for analysis: SPAN or mirror ports, aggregation TAPs (Test Access Ports),
Virtual Machine Based Resource Allocation For Cloud Computing Environment
Virtual Machine Based Resource Allocation For Cloud Computing Environment D.Udaya Sree M.Tech (CSE) Department Of CSE SVCET,Chittoor. Andra Pradesh, India Dr.J.Janet Head of Department Department of CSE
Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide. An Oracle White Paper October 2010
Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide An Oracle White Paper October 2010 Disclaimer The following is intended to outline our general product direction.
