Preliminary study of big data transfer over computer network
|
|
|
- Blake Stanley
- 10 years ago
- Views:
Transcription
1 КОМПЬЮТЕРНЫЕ ИССЛЕДОВАНИЯ И МОДЕЛИРОВАНИЕ 2015 Т. 7 3 С ПЛЕНАРНЫЕ ДОКЛАДЫ УДК: Preliminary study of big data transfer over computer network S. E. Khoruzhnikov 1, V. A. Grudinin 1, O. L. Sadov 1, A. Y. Shevel 1, 2, A. B. Kairkanov 1,a 1 ITMO University St. Petersburg, 49 Kronverksky Ave., St.Petersburg, , Russia 2 National Research Centre "Kurchatov Institute" B. P. Konstantinov, Petersburg Nuclear Physics Institute, Orlova Roscha, Gatchina, , Russia a [email protected] Received December 1, 2014 The transfer of Big Data over computer network is important and unavoidable operation in the past, now and in any feasible future. There are a number of methods to transfer the data over computer global network (Internet) with a range of tools. In this paper the transfer of one piece of Big Data from one point in the Internet to another point in Internet in general over long range distance: many thousands kilometers. Several free of charge systems to transfer the Big Data are analyzed here. The most important architecture features are emphasized and suggested idea to add SDN Openflow protocol technique for fine tuning the data transfer over several parallel data links. Keywords: data, Linux, transfer, SDN, Openflow, network Предварительное изучение передачи больших данных по компьютерной сети С. Э. Хоружников 1, В. А. Грудинин 1, О. Л. Садов 1, А. Е. Шевель 1,2, А. Б. Каирканов 1 1 Санкт-Петербургский национальный исследовательский университет информационных технологий, механики и оптики, Россия, , г. Санкт-Петербург, Кронверкский проспект, д Национальный исследовательский центр «Курчатовский институт», Петербургский институт ядерной физики имени Б.П. Константинова, Россия, , Ленинградская обл., Гатчина, Орлова роща, ФГБУ ПИЯФ Передача больших данных по компьютерной сети это важная и неотъемлемая операция в прошлом, настоящем и в любом обозримом будущем. Существует несколько методов передачи данных по глобальной компьютерной сети (Интернет) с помощью ряда инструментов. В этой статье рассматривается передача данных из одной точки Интернета в другую точку Интернета в основном на большие расстояния: многие тысячи километров. В статье представлен анализ нескольких бесплатных систем передачи больших данных. Подчеркиваются наиболее важные архитектурные особенности и предлагается идея использования технологии ПКС на базе протокола Openflow для улучшения процесса передачи данных по нескольким параллельным каналам связи. Ключевые слова: данные, Линукс, передача, ПКС, Openflow, сеть The work is supported by the Saint-Petersburg National Research University of Information Technology, Mechanics & Optics ( Citation: Computer Research and Modeling, 2015, vol. 6, no. 3, pp Сергей Эдуардович Хоружников, Владимир Алексеевич Грудинин, Олег Леонидович Садов, Андрей Евгеньевич Шевель, Арсен Болатович Каирканов
2 422 S. E. Khoruzhnikov et al. I. Introduction The Big Data [Big Data, 2014] is known problem for many years. In each period the term Big Data does mean different volume and character of the data. Keeping in mind triple V : Velocity, Volume, Variety we can pay attention that all those features are relative to current state of the technology. For example in 1980-s the volume of 1 TB was considered as huge volume. There is a range of aspects of the problem: store, analyze, transfer, etc. In this paper we discuss one of important aspects of the Big Data the transfer over global computer network. II. The sources of the Big Data It is known the long list of human activities (scientific and business) which are the generators of large volume of data [Information Revolution..., 2014; Square Kilometer Array, 2014; Large Synoptic Survey Telescope, 2014; Facility for Antiproton and Ion Research, 2014; International Thermonuclear Experimental Reactor, 2014; CERN, 2014; Lucinda Borovick Richard L. Villars, 2013; The Center for Large-scale Data Systems Research, 2013; Johnston et al., 2013]. In according [Information Revolution..., 2014] total volume of business mails in the World in year 2012 is around 3000 PB ( ). The consensus estimation for the total volume of stored data is growing times each year starting from In this paper (and for our tests) we will assume that volume of data around 100 TB (10 14 ) and more could be labeled as Big Data. Quite probably the volume of Big Data will grow with the time. Another source of Big Data the preservation of the data for long periods of time: several tens or more years. Many aspects of our personal, society, technical, and business life are now held in digital form. Large volume of those data needs to be stored and preserved. For example, results of medicine tests, data generated by important engines of various kinds (airplane engines, power station generators, etc) and other data have to be archived for long time. The preserved data will be kept in distributed (locally and globally) storage. It is assumed that replicas of preserved data have to be stored in several places (continents) to avoid data loss due to technical, nature or social disasters. Historically one of the first field where Big Data came into reality was experiments in High Energy Physics (HEP). As the result a number of aspects for data transfer were analyzed and a range of problems was solved. Now more and more scientific and business sectors are dealing (or plan to) with the Big data [Information Revolution..., 2014; Square Kilometer Array, 2014; Large Synoptic Survey Telescope, 2014; Facility for Antiproton and Ion Research, 2014; International Thermonuclear Experimental Reactor, 2014; CERN, 2014; Tierney et al.]. Last time the interest to data transfer of increasing volumes is growing [Nam et al., 2013; Gunter Dan et al., 2012]. III. Freely available utilities/tools for data transfer over the network The time to transfer over global computer network (Internet) depends on the real data link bandwidth and volume of the data. Taking into account that we talk about volume 100TB and more we can estimate minimum required time for data copy over the network link with 1 Gbit capacity. It will give us about 100MB/sec, hence 100TB/100MB = secs = hours = 11.6 days. During this time the parameters of the network link might be changed. For example percent of dropped network packages and other data link parameters can be varied significantly. The data link might be suffered of operation interruptions for different period: secs, hours, days. Also important a lot of Linux kernel network parameters. There are several hundreds of kernel network parameters. Not all of them are equally sensitive or influencing. Among most important of them it is good to mention TCP Window size, MTU, congestion control algorithm, etc. Of course quite important the number of independent network links which could be used in parallel. Finally it is seen that in each data transfer of large volume we need to be able to tune (to set) different number of threads, different size of TCP Window, etc. КОМПЬЮТЕРНЫЕ ИССЛЕДОВАНИЯ И МОДЕЛИРОВАНИЕ
3 Preliminary study of big data transfer over computer network 423 Now it is time to observe freely available data transfer tools/utilities which might be used to transfer Big Data over the network. A. Ideas to Compare the data transfer utilities First of all quick consideration of parameters to compare the data transfer utilities which might help to transfer Big Data. Multi-stream data transfer mode is ability to use several TCP streams in parallel. Multi-link data transfer mode ability to use more than one data link in parallel; important feature especially if it is possible to take into account that available network links are not equal in bandwidth and in conditions (reliability, price, real status, etc). Possibility to set parameters low level parameters e.g. TCP Window size, etc. Ability in case of failure of the data transfer to continue the data transfer from point of failure. In reality the data transfer consists of many steps: read the data from the storage, transfer the data over network, write the received data to the storage on remote computer system. In this paper our attention is concentrated more on network transfer process. B. Low level data transfer utilities/tools We could mention several utilities for the data transfer over the network (at least part of them are known for around ten years): one of low level protocols to transfer the data over the network is UDT [UDT: Breaking, 2014]. UDT is library which implements data transfer protocol which permit to use udp, but not tcp. In some cases the library can help to improve data link usage, i.e. to reduce the data transfer time. the protocol RDMA over Converged Ethernet (RoCE) [Tierney et all.] has been studied and it was found that in many cases RoCE shows better results than UDP, UDT, conventional TCP. MP TCP [MutiPath TCP, 2014] is interesting protocol which permits to use several data links in parallel for one data transfer. The protocol is implemented as Linux kernel driver. openssh family [OpenSSH, 2014] well known data transfer utilities deliver strong authentication and a number of data encryption algorithms. Data compression before encryption to reduce the data volume to be transferred is possible as well. There are two well known openssh flavors: patched SSH version [Patched OpenSSH, 2014] which can use increased size of buffers and SSH with Globus GSI authentication. No real restart after failure. No parallel data transfer streams. bbcp [BBCP utility to transfer, 2014] utility for bulk data transfer. It is assumed that bbcp is running on both sides, i.e. transmitter, as client, and receiver as server. Utility bbcp has many features including the setting: TCP Window size; number of TCP streams; I/O buffer size; resuming failed copy; authentication with ssh; using pipes, where source or/and destination might be pipe; special option to transfer small files; and many other options dealing with many practical details. bbftp [BBFTP Utility for bulk, 2014] utility for bulk data transfer. It implements its own transfer protocol, which is optimized for large files (larger than 2GB) and secure as it does not read the password in a file and encrypts the connection information. bbftp main features are: SSH and Grid Certificate authentication modules; multi-stream transfer; big TCP windows as defined in RFC1323; 2015 Т. 7, 3, С
4 424 S. E. Khoruzhnikov et al. automatic retry; customizable time-outs; other useful practical features. Xdd [Hodson et al., 2013] utility developed to optimize data transfer and I/O processes for storage systems. fdt [Fast Data Transfer, 2014] Java utility for multi-stream data transfer. gridftp [Grid/Globus data, 2014] is advanced reincarnation of well known utility ftp redesigned more than 10 years ago for globus security infrastructure (GSI) environment. The utility has many features and main usage of those are: two security flavors: Globus GSI and SSH; the file with host aliases: each next data transfer stream will use next host aliases (useful for computer cluster); number of parallel data transfer streams; buffer size; restart failed operations and number of restarts. Many of mentioned utilities are quite effective for data transfer from point of view of link capacity usage. However Big Data transfer assumes significant transmission time (may be many hours, days or more). For long time it is not easy to rely on those quite simple transfer procedures. C. Middle level File Transfer Service The FTS3 [File Transfer Service, 2014] is relatively new and advanced tool for data transfer of large volume of the data over the network. It has most features already mentioned above and more. There is advanced data transfer tracking (log) feature, ability to use http, restful, and CLI interfaces to control the process of the data transfer. Another interesting development is SHIFT [Data Transfer Tools, 2014] which is dedicated to do reliable data transfer in LAN and WAN. There was paid much attention to the reliability, advanced tracking, performance of the data transfer and the usage of parallel data transfer between so called equivalent hosts (between computer clusters). D. High level data management service: PhEDEx PhEDEx Physics Experiment Data Export is used (and developed) in collaboration around Compact Muon Solenoid (CMS) experiment [The CMS Collaboration..., 2008; Kaselis, 2012; PhEDEx CMS Data Transfers, 2014; PHEDEX data..., 2014] at CERN [CERN, 2014]. The experiment does produce a lot of experimental data (in 2013 it was written around 130 PB). Data analysis requires to copy of the data in a range of large computing clusters (about 10 locations in different countries and continents) for analysis and data archiving. Later on the fractions of the data might be copied to smaller computing facilities (more than 60 locations). Total data transfer per day is achieved 350 TB/day [Kaselis, 2012]. It is possible that in nearest future the volume per day will be increased. Because in between several sites there are more than one link in PhEDEx there were developed routing technique which permit to try alternative route when default route is not available. Finally the system PhEDEx is quite complicated and the management service depends on the physics experiment collaboration environment. It is unlikely that PhEDEx is possible to use without redesign in different environment. IV. Consideration Mentioned utilities have several common useful features for data transfer. Among them: client-server architecture; КОМПЬЮТЕРНЫЕ ИССЛЕДОВАНИЯ И МОДЕЛИРОВАНИЕ
5 Preliminary study of big data transfer over computer network 425 ability to set the buffer size, TCP Window size, etc; ability to perform various operations before real data transfer and after data transfer, use a range of drivers/methods to read/write files to/from secondary storage, etc; use more than one of authentication techniques; use a number of transfer streams; use in some conditions more than one network link for data transfer; usage of a number of techniques to make data transfer more reliable. The utilities are not equal in number of parameters and scope of suggested tasks. Part of them are well suited to be used as independent data transfer utilities in almost any environment. Others, like PhEDEx (in CMS) and comparable systems in collaboration ATLAS [The Rucio project..., 2014] are dedicated to be used as part of more complicated and specific computing environment. In other words there is stack of toolkit which might help in many cases to transfer the Big Data over networks. At the same time it is seen that quite a few utilities can use more than one network link. At the same time no tool suggests fine tuning with parallel data links. Fine tuning is considered as possibility to apply the different policy to each data link. In general parallel data links might be completely different in nature, features, and conditions of use. In particular it is assumed individual QoS for each network link to be used in data transfer and ability to change the policy on the fly. All that give the idea that special application is required which might watch the data links status and change the parameters of data transfer accordingly to real situation in the data links. QoS is planned to be set with protocol Openflow [Open Networking..., 2013; Nunes et al., 2014]. The special tool PerfSonar [Zurawski et al., 2013] will be used to watch the data links status. There is special aspect in the procedure of the comparison of the utilities to transfer the Big Data over the computer network. The real networks are different from each other. All above circumstances give the idea that to compare the variety of data transfer utilities (especially for Big Data) demands the customized testbed which is able to simulate at least main network problems, e.g. changing RTT, delays, package drop percent, and so on. Such the testbed development has been started at the network laboratory [Laboratory, 2014]. The need for testbed is becoming obvious by previously obtained measurement results [Nam et al., 2013]. Here is seen the comparative measurements for one data transfer stream and many streams. The data transfers were performed with special servers so called Data Transfer Nodes (DTN). DTNs have several specific techniques to transfer the data from LAN to WAN destinations. A number of utilities: rsync, scp, bbcp, GridFTP were discussed and measured just for concrete transferred file sizes (11 KB, 3.5 MB, 158 MB, 2.8 GB, and 32 GB) to transfer 6 files in each case. It was discovered that no change in the transfer speed after number of streams more than 8. At the same time no information about the Linux kernel parameters, how authors designated what the speed has been measured: data transfer speed over the data link or transfer speed from disk subsystem to main memory? It is planned to get answer on those questions in developed testbed. Also in the testbed we are taking into account the ideas expressed in [Gunter et al., 2012]. The testbed is intended to be platform to compare different utilities in the same environment. In addition it is planned to use advanced techniques with SDN approach to use parallel data links with use QoS on each data link. As the first step it is planned to perform comparative measurements with the range of data transfer utilities with writing all the measurement conditions details. That permits to compare in future other data transfer methods in exactly same environment in the testbed. V. The testbed progress Now the testbed consists of two servers HP DL380p Gen8 E5-2609, Intel(R) Xeon(R) CPU 64 GB under Scientific Linux 6.5. Because it is planned to test everything in virtual environment for each mentioned data transfer systems two virtual machines will be used. One VM as transmitter and another VM as receiver. In other words we have around ten VMs. The cloud infrastructure Openstack (version Icehouse) has been deployed to organize above VMs. PerfSonar has been deployed as well Т. 7, 3, С
6 426 S. E. Khoruzhnikov et al. To study different types of data the special procedure has been developed to generate test directory with files of random length, the total volume of test directory is defined by the parameter of the procedure. During generation of test data it is possible to set mean value for file size and dispersion of the file size. The data inside each file in test directory is intentionally prepared to eliminate possible effect of the data compression (if any) during data transfer. In initial stage it is planned to compare all the above data transfer systems in local area network to be sure that everything (all scripts) is functioning properly. The distinct problem is to write all logs, parameters, etc during the measurement. As it was mentioned earlier in the paper many parameter values in the directory /proc might affect the speed of the data transfer. That means the requirement to write automatically whole directory /proc into some place, let say "log directory". In addition there is need to write all the parameters used when data transfer starts. Also it is required to write all messages from data transfer engine/utility. Finally the data link status is intended to be written as well. All mentioned information has to be saved in "log directory". All those features have been implemented in the scripts dedicated to do measurements. Developed scripts with short descriptions are available in References BBCP utility to trasnfer the data over network BBFTP Utility for bulk data transfe Big Data CERN Data Transfer Tools Facility for Antiproton and Ion Research Fast Data Transfer File Transfer Service FTS3 Grid/Globus data transfer tool. Client part is known as globus-url-copy Gunter Dan et al. Exploiting Network Parallelism for Improving Data Transfer Performance, High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, DOI: /SC.Companion Hodson Stephen W., Poole Stephen W., Ruwart Thomas M., Settlemyer Bradley W. // Moving Large Data Sets Over High-Performance Long Distance Networks // Oak Ridge National Laboratory, One Bethel Valley Road, P.O. Box 2008 Oak Ridge, // [ ] index.php?halsid=ig58511e1q1ekqq75uud43dn66&view_this_doc=hal &version=5 Information Revolution: Big Data Has Arrived at an Almost Unimaginable Scale International Thermonuclear Experimental Reactor Johnston William E., Dart Eli, Ernst Michael, Tierney Brian // Enabling high throughput in widely distributed data management and analysis systems: Lessons from the LHC (text) and (presentation) Kaselis R., Piperov S., Magini N., Flix J., Gutsche O., Kreuzer P., Yang M., Liu S., Ratnikova N., Sartirana A., Bonacorsi D., Letts J. CMS Data Transfer operations after the first years of LHC КОМПЬЮТЕРНЫЕ ИССЛЕДОВАНИЯ И МОДЕЛИРОВАНИЕ
7 Preliminary study of big data transfer over computer network 427 collisions // International Conference on Computing in High Energy and Nuclear Physics 2012 (CHEP2012) IOP Publishing Journal of Physics: Conference Series 396 (2012) p. Laboratory of the Network Technology Large Synoptic Survey Telescope Lucinda Borovick Richard L. Villars // White paper. The Critical Role of the Network in Big Data Applications lications.pdf [last read ] MutiPath TCP Linux Kernel Implementation Nam Hai Ah et al. The Practical Obstacles of Data Transfer: Why researchers still love scp // November 2013 NDM'13: Proceedings of the Third International Workshop on Network-Aware Data Management FTOKEN= Nunes Bruno Astuto A., Mendonca Marc, Nguyen Xuan-Nam, Obraczka Katia, and Turletti Thierry // A Survey of Software-Defined Networking: Past, Present, and Future of Programmable Networks Open Networking Foundation White Paper Software-Defined Networking: The New Norm for Networks // (last read: ) OpenSSH Patched OpenSSH PhEDEx CMS Data Transfers PHEDEX data transfer system and Square Kilometer Array The Center for Large-scale Data Systems Research at the San Diego Supercomputer Center [last read ] The CMS Collaboration 2008 The CMS experiment at the CERN LHC JINST 3 S08004 The Rucio project is the new version of ATLAS Distributed Data Management (DDM) system services Tierney Brian, Kissel Ezra, Swany Martin, Pouyoul Eric // Efficient Data Transfer Protocol for BigData // Lawrence Berkeley National Laboratory, Berkeley, CA // School of Informatics and Computing, Indiana University, Bloomington, IN UDT: Breaking the Data Transfer Bottleneck Zurawski J., Balasubramanian S., Brown A., Kissel E., Lake A., Swany M., Tierney B., Zekauskas M. // perfsonar: On-board Diagnostics for Big Data [last reading date: ]. 6 p Т. 7, 3, С
IMPLEMENTING GREEN IT
Saint Petersburg State University of Information Technologies, Mechanics and Optics Department of Telecommunication Systems IMPLEMENTING GREEN IT APPROACH FOR TRANSFERRING BIG DATA OVER PARALLEL DATA LINK
Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University
Data Movement and Storage. Drew Dolgert and previous contributors
Data Movement and Storage Drew Dolgert and previous contributors Data Intensive Computing Location Viewing Manipulation Storage Movement Sharing Interpretation $HOME $WORK $SCRATCH 72 is a Lot, Right?
Fundamentals of Data Movement Hardware
Fundamentals of Data Movement Hardware Jason Zurawski ESnet Science Engagement [email protected] CC-NIE PI Workshop April 30 th 2014 With contributions from S. Balasubramanian, G. Bell, E. Dart, M. Hester,
Distributed dcache-based storage system of UB RAS
КОМПЬЮТЕРНЫЕ ИССЛЕДОВАНИЯ И МОДЕЛИРОВАНИЕ 2015 Т. 7 3 С. 559 563 СЕКЦИОННЫЕ ДОКЛАДЫ УДК: 004.63 Distributed dcache-based storage system of UB RAS E. Yu. Kuklin 1,3,a, A. V. Sozykin 1,3, A. Yu. Bersenev
Roadmap for Applying Hadoop Distributed File System in Scientific Grid Computing
Roadmap for Applying Hadoop Distributed File System in Scientific Grid Computing Garhan Attebury 1, Andrew Baranovski 2, Ken Bloom 1, Brian Bockelman 1, Dorian Kcira 3, James Letts 4, Tanya Levshina 2,
IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez
IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data
Campus Network Design Science DMZ
Campus Network Design Science DMZ Dale Smith Network Startup Resource Center [email protected] The information in this document comes largely from work done by ESnet, the USA Energy Sciences Network see
High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand
High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand Hari Subramoni *, Ping Lai *, Raj Kettimuthu **, Dhabaleswar. K. (DK) Panda * * Computer Science and Engineering Department
Integration of Network Performance Monitoring Data at FTS3
Integration of Network Performance Monitoring Data at FTS3 July-August 2013 Author: Rocío Rama Ballesteros Supervisor(s): Michail Salichos Alejandro Álvarez CERN openlab Summer Student Report 2013 Project
Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
A Reliable and Fast Data Transfer for Grid Systems Using a Dynamic Firewall Configuration
A Reliable and Fast Data Transfer for Grid Systems Using a Dynamic Firewall Configuration Thomas Oistrez Research Centre Juelich Juelich Supercomputing Centre August 21, 2008 1 / 16 Overview 1 UNICORE
UDR: UDT + RSYNC. Open Source Fast File Transfer. Allison Heath University of Chicago
UDR: UDT + RSYNC Open Source Fast File Transfer Allison Heath University of Chicago Motivation for High Performance Protocols High-speed networks (10Gb/s, 40Gb/s, 100Gb/s,...) Large, distributed datasets
HADOOP, a newly emerged Java-based software framework, Hadoop Distributed File System for the Grid
Hadoop Distributed File System for the Grid Garhan Attebury, Andrew Baranovski, Ken Bloom, Brian Bockelman, Dorian Kcira, James Letts, Tanya Levshina, Carl Lundestedt, Terrence Martin, Will Maier, Haifeng
THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid
THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid Contents 2 The ARCOS Group. Expand motivation. Expand
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid Wantao Liu 1,2 Raj Kettimuthu 2,3, Brian Tieman 3, Ravi Madduri 2,3, Bo Li 1, and Ian Foster 2,3 1 Beihang University, Beijing, China 2 The
Network Attached Storage. Jinfeng Yang Oct/19/2015
Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability
Storage database in cloud processing
КОМПЬЮТЕРНЫЕ ИССЛЕДОВАНИЯ И МОДЕЛИРОВАНИЕ 2015 Т. 7 3 С. 493 498 СЕКЦИОННЫЕ ДОКЛАДЫ УДК: 004.65 Storage database in cloud processing A. V. Bogdanov a, Thurein Kyaw Lwin b Saint Petersburg State University,
Web Service Robust GridFTP
Web Service Robust GridFTP Sang Lim, Geoffrey Fox, Shrideep Pallickara and Marlon Pierce Community Grid Labs, Indiana University 501 N. Morton St. Suite 224 Bloomington, IN 47404 {sblim, gcf, spallick,
Comparisons between HTCP and GridFTP over file transfer
Comparisons between HTCP and GridFTP over file transfer Andrew McNab and Yibiao Li Abstract: A comparison between GridFTP [1] and HTCP [2] protocols on file transfer speed is given here, based on experimental
DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE
DSS Data & Diskpool and cloud storage benchmarks used in IT-DSS CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Geoffray ADDE DSS Outline I- A rational approach to storage systems evaluation
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
Benchmarking the Performance of XenDesktop Virtual DeskTop Infrastructure (VDI) Platform
Benchmarking the Performance of XenDesktop Virtual DeskTop Infrastructure (VDI) Platform Shie-Yuan Wang Department of Computer Science National Chiao Tung University, Taiwan Email: [email protected]
Frequently Asked Questions
Frequently Asked Questions 1. Q: What is the Network Data Tunnel? A: Network Data Tunnel (NDT) is a software-based solution that accelerates data transfer in point-to-point or point-to-multipoint network
SMB Direct for SQL Server and Private Cloud
SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server
Aspera Direct-to-Cloud Storage WHITE PAPER
Transport Direct-to-Cloud Storage and Support for Third Party April 2014 WHITE PAPER TABLE OF CONTENTS OVERVIEW 3 1 - THE PROBLEM 3 2 - A FUNDAMENTAL SOLUTION - ASPERA DIRECT-TO-CLOUD TRANSPORT 5 3 - VALIDATION
Overlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm [email protected] [email protected] Department of Computer Science Department of Electrical and Computer
PARALLELS CLOUD STORAGE
PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...
Scala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland
Available on CMS information server CMS CR -2012/114 The Compact Muon Solenoid Experiment Conference Report Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland 23 May 2012 CMS Data Transfer operations
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid Wantao Liu, 1,2 Rajkumar Kettimuthu, 3,4 Brian Tieman, 5 Ravi Madduri, 3,4 Bo Li, 1 Ian Foster 2,3,4 1 School of Computer Science and Engineering,
A High-Performance Storage and Ultra-High-Speed File Transfer Solution
A High-Performance Storage and Ultra-High-Speed File Transfer Solution Storage Platforms with Aspera Abstract A growing number of organizations in media and entertainment, life sciences, high-performance
(Scale Out NAS System)
For Unlimited Capacity & Performance Clustered NAS System (Scale Out NAS System) Copyright 2010 by Netclips, Ltd. All rights reserved -0- 1 2 3 4 5 NAS Storage Trend Scale-Out NAS Solution Scaleway Advantages
Improving the Performance of TCP Using Window Adjustment Procedure and Bandwidth Estimation
Improving the Performance of TCP Using Window Adjustment Procedure and Bandwidth Estimation R.Navaneethakrishnan Assistant Professor (SG) Bharathiyar College of Engineering and Technology, Karaikal, India.
Open Source File Transfers
Open Source File Transfers A comparison of recent open source file transfer projects By: John Tkaczewski Contents Introduction... 2 Recent Open Source Projects... 2 UDT UDP-based Data Transfer... 4 Tsunami
RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University
RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction
EVALUATING NETWORK BUFFER SIZE REQUIREMENTS
EVALUATING NETWORK BUFFER SIZE REQUIREMENTS for Very Large Data Transfers Michael Smitasin Lawrence Berkeley National Laboratory (LBNL) Brian Tierney Energy Sciences Network (ESnet) [ 2 ] Example Workflow
Storage at a Distance; Using RoCE as a WAN Transport
Storage at a Distance; Using RoCE as a WAN Transport Paul Grun Chief Scientist, System Fabric Works, Inc. (503) 620-8757 [email protected] Why Storage at a Distance the Storage Cloud Following
POSIX and Object Distributed Storage Systems
1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome
Consolidation Technology in the Cloud Data Processing
ISBN 978-93-84422-37-0 2015 International Conference on Advances in Software, Control and Mechanical Engineering (ICSCME'2015) Antalya (Turkey) Sept. 7-8, 2015 pp. 1-5 Consolidation Technology in the Cloud
Grid Scheduling Dictionary of Terms and Keywords
Grid Scheduling Dictionary Working Group M. Roehrig, Sandia National Laboratories W. Ziegler, Fraunhofer-Institute for Algorithms and Scientific Computing Document: Category: Informational June 2002 Status
The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology
3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related
EonStor DS remote replication feature guide
EonStor DS remote replication feature guide White paper Version: 1.0 Updated: Abstract: Remote replication on select EonStor DS storage systems offers strong defense against major disruption to IT continuity,
HPC performance applications on Virtual Clusters
Panagiotis Kritikakos EPCC, School of Physics & Astronomy, University of Edinburgh, Scotland - UK [email protected] 4 th IC-SCCE, Athens 7 th July 2010 This work investigates the performance of (Java)
TCP tuning guide for distributed application on wide area networks 1.0 Introduction
TCP tuning guide for distributed application on wide area networks 1.0 Introduction Obtaining good TCP throughput across a wide area network usually requires some tuning. This is especially true in high-speed
A Tutorial on Configuring and Deploying GridFTP for Managing Data Movement in Grid/HPC Environments
A Tutorial on Configuring and Deploying GridFTP for Managing Data Movement in Grid/HPC Environments John Bresnahan Michael Link Rajkumar Kettimuthu Dan Fraser Argonne National Laboratory University of
Assignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
perfsonar: End-to-End Network Performance Verification
perfsonar: End-to-End Network Performance Verification Toby Wong Sr. Network Analyst, BCNET Ian Gable Technical Manager, Canada Overview 1. IntroducGons 2. Problem Statement/Example Scenario 3. Why perfsonar?
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Enabling Technologies for Distributed Computing
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies
How SafeVelocity Improves Network Transfer of Files
How SafeVelocity Improves Network Transfer of Files 1. Introduction... 1 2. Common Methods for Network Transfer of Files...2 3. Need for an Improved Network Transfer Solution... 2 4. SafeVelocity The Optimum
Evaluation of 40 Gigabit Ethernet technology for data servers
Evaluation of 40 Gigabit Ethernet technology for data servers Azher Mughal, Artur Barczyk Caltech / USLHCNet CHEP-2012, New York http://supercomputing.caltech.edu Agenda Motivation behind 40GE in Data
Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine
Where IT perceptions are reality Test Report OCe14000 Performance Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine Document # TEST2014001 v9, October 2014 Copyright 2014 IT Brand
Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking
Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking Burjiz Soorty School of Computing and Mathematical Sciences Auckland University of Technology Auckland, New Zealand
Tier3 Network Issues. Richard Carlson May 19, 2009 [email protected]
Tier3 Network Issues Richard Carlson May 19, 2009 [email protected] Internet2 overview Member organization with a national backbone infrastructure Campus & Regional network members National and International
2) Xen Hypervisor 3) UEC
5. Implementation Implementation of the trust model requires first preparing a test bed. It is a cloud computing environment that is required as the first step towards the implementation. Various tools
An Introduction to Dispersive Virtualized Networks
An Introduction to Dispersive Virtualized Networks Dispersive Technologies software and cloud-based virtualized networks deliver mission-critical communications over the public Internet to help organizations
Deploying Business Virtual Appliances on Open Source Cloud Computing
International Journal of Computer Science and Telecommunications [Volume 3, Issue 4, April 2012] 26 ISSN 2047-3338 Deploying Business Virtual Appliances on Open Source Cloud Computing Tran Van Lang 1 and
Content Distribution Management
Digitizing the Olympics was truly one of the most ambitious media projects in history, and we could not have done it without Signiant. We used Signiant CDM to automate 54 different workflows between 11
Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0
Muse Server Sizing 18 June 2012 Document Version 0.0.1.9 Muse 2.7.0.0 Notice No part of this publication may be reproduced stored in a retrieval system, or transmitted, in any form or by any means, without
Distributed applications monitoring at system and network level
Distributed applications monitoring at system and network level Monarc Collaboration 1 Abstract Most of the distributed applications are presently based on architectural models that don t involve real-time
Parallels Cloud Server 6.0
Parallels Cloud Server 6.0 Parallels Cloud Storage I/O Benchmarking Guide September 05, 2014 Copyright 1999-2014 Parallels IP Holdings GmbH and its affiliates. All rights reserved. Parallels IP Holdings
How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router
HyperQ Hybrid Flash Storage Made Easy White Paper Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com [email protected] [email protected]
TCP Labs. WACREN Network Monitoring and Measurement Workshop Antoine Delvaux [email protected] perfsonar developer 30.09.
TCP Labs WACREN Network Monitoring and Measurement Workshop Antoine Delvaux [email protected] perfsonar developer 30.09.2015 Hands-on session We ll explore practical aspects of TCP Checking the effect
HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect [email protected]
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect [email protected] EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY
51 CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY Web application operations are a crucial aspect of most organizational operations. Among them business continuity is one of the main concerns. Companies
Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata
Implementing Network Attached Storage Ken Fallon Bill Bullers Impactdata Abstract The Network Peripheral Adapter (NPA) is an intelligent controller and optimized file server that enables network-attached
One solution to protect all Physical, Virtual and Cloud
One solution to protect all Physical, Virtual and Cloud Andy Ebbs Solution Architect [email protected] +44 1905 728 023 +44 7917 326 602 Leaders Have Vision visionsolutions.com 1 What What
From Centralization to Distribution: A Comparison of File Sharing Protocols
From Centralization to Distribution: A Comparison of File Sharing Protocols Xu Wang, Teng Long and Alan Sussman Department of Computer Science, University of Maryland, College Park, MD, 20742 August, 2015
File Transfer Examples. Running commands on other computers and transferring files between computers
Running commands on other computers and transferring files between computers 1 1 Remote Login Login to remote computer and run programs on that computer Once logged in to remote computer, everything you
www.thinkparq.com www.beegfs.com
www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a
CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms
CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, César A. F. De Rose,
Original-page small file oriented EXT3 file storage system
Original-page small file oriented EXT3 file storage system Zhang Weizhe, Hui He, Zhang Qizhen School of Computer Science and Technology, Harbin Institute of Technology, Harbin E-mail: [email protected]
Challenges of Sending Large Files Over Public Internet
Challenges of Sending Large Files Over Public Internet CLICK TO EDIT MASTER TITLE STYLE JONATHAN SOLOMON SENIOR SALES & SYSTEM ENGINEER, ASPERA, INC. CLICK TO EDIT MASTER SUBTITLE STYLE OUTLINE Ø Setting
The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang
International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang Nanjing Communications
Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014
Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet Anand Rangaswamy September 2014 Storage Developer Conference Mellanox Overview Ticker: MLNX Leading provider of high-throughput,
Data Backup and Archiving with Enterprise Storage Systems
Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia [email protected],
Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week
Michael Thomas, Dorian Kcira California Institute of Technology CMS Offline & Computing Week San Diego, April 20-24 th 2009 Map-Reduce plus the HDFS filesystem implemented in java Map-Reduce is a highly
D1.2 Network Load Balancing
D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June [email protected],[email protected],
Solution for private cloud computing
The CC1 system Solution for private cloud computing 1 Outline What is CC1? Features Technical details Use cases By scientist By HEP experiment System requirements and installation How to get it? 2 What
