IMPLEMENTING GREEN IT



Similar documents
Transfer of Large Volume Data over Internet with Parallel Data Links and SDN

Data Management. Network transfers

N /150/151/160 RAID Controller. N MegaRAID CacheCade. Feature Overview

Improved metrics collection and correlation for the CERN cloud storage test framework

Assignment # 1 (Cloud Computing Security)

Data Movement and Storage. Drew Dolgert and previous contributors

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage

Performance And Scalability In Oracle9i And SQL Server 2000

System Requirements Table of contents

D1.2 Network Load Balancing

Saint Petersburg State University of Information Technologies, Mechanics and Optics Department of Telecommunication Systems PERCCOM Master Program

HPSA Agent Characterization

Very Large Enterprise Network, Deployment, Users

POSIX and Object Distributed Storage Systems

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

How To Install Openstack On Ubuntu (Amd64)

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

QA PRO; TEST, MONITOR AND VISUALISE MYSQL PERFORMANCE IN JENKINS. Ramesh Sivaraman

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

Parallels Cloud Storage

Very Large Enterprise Network Deployment, 25,000+ Users

SUN ORACLE EXADATA STORAGE SERVER

File System Design and Implementation

Analysis of VDI Storage Performance During Bootstorm

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Cloud Implementation using OpenNebula

3CX Phone System Enterprise 512SC Edition Performance Test

Toolbox 4.3. System Requirements

SAS Data Set Encryption Options

The Monitis Monitoring Agent ver. 1.2

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

A Study of Data Management Technology for Handling Big Data

Boost Database Performance with the Cisco UCS Storage Accelerator

ManageEngine EventLog Analyzer. Best Practices Document

HP recommended configuration for Microsoft Exchange Server 2010: ProLiant DL370 G6 supporting GB mailboxes

A Comparison of Oracle Performance on Physical and VMware Servers

What is the real cost of Commercial Cloud provisioning? Thursday, 20 June 13 Lukasz Kreczko - DICE 1

White Paper. Proving Scalability: A Critical Element of System Evaluation. Jointly Presented by NextGen Healthcare & HP

ENTERPRISE-CLASS MONITORING SOLUTION FOR EVERYONE ALL-IN-ONE OPEN-SOURCE DISTRIBUTED MONITORING

Deploying Business Virtual Appliances on Open Source Cloud Computing

ManageEngine EventLog Analyzer. Best Practices Document

Paragon Backup Retention Wizard

IT Business Management System Requirements Guide

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

Virtuoso and Database Scalability

How To Set Up Foglight Nms For A Proof Of Concept

Configuring Apache Derby for Performance and Durability Olav Sandstå

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

Hardware and Software Requirements. Release 7.5.x PowerSchool Student Information System

A Middleware Strategy to Survive Compute Peak Loads in Cloud

Embedded Operating Systems in a Point of Sale Environment. White Paper

SYSTEM SETUP FOR SPE PLATFORMS

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card

Brainlab Node TM Technical Specifications

Outline. Introduction Virtualization Platform - Hypervisor High-level NAS Functions Applications Supported NAS models

Receptionist-Small Business Administrator guide

Visualization Cluster Getting Started


Dell Reference Configuration for Hortonworks Data Platform

Rally Installation Guide

Whitewater Cloud Storage Gateway

Signiant Agent installation

HADOOP AT NOKIA JOSH DEVINS, NOKIA HADOOP MEETUP, JANUARY 2011 BERLIN

Oracle TimesTen IMDB - An Introduction

Data Collection Agent for Active Directory

Fusion iomemory iodrive PCIe Application Accelerator Performance Testing

A Converged Appliance for Software-Defined VDI: Citrix XenDesktop 7.6 on Citrix XenServer and NexentaStor

Whitewater Cloud Storage Gateway

owncloud Enterprise Edition on IBM Infrastructure

Benchmarking Cassandra on Violin

Cisco Prime Home 5.0 Minimum System Requirements (Standalone and High Availability)

Preparing a SQL Server for EmpowerID installation

FileCruiser Backup & Restoring Guide

Arrow ECS sp. z o.o. Oracle Partner Academy training environment with Oracle Virtualization. Oracle Partner HUB

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs

Microsoft Dynamics CRM 2011 Guide to features and requirements


A Comparison of Oracle Performance on Physical and VMware Servers

HP Proliant BL460c G7

Release Notes. Cloud Attached Storage

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Serving 4 million page requests an hour with Magento Enterprise

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

FUJITSU Enterprise Product & Solution Facts

Question: 3 When using Application Intelligence, Server Time may be defined as.

Amazon EC2 XenApp Scalability Analysis

Investigating Private Cloud Storage Deployment using Cumulus, Walrus, and OpenStack/Swift

Terminal Server Software and Hardware Requirements. Terminal Server. Software and Hardware Requirements. Datacolor Match Pigment Datacolor Tools

AlphaTrust PRONTO - Hardware Requirements

Data Centers and Cloud Computing

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Deploying in a Distributed Environment

Transcription:

Saint Petersburg State University of Information Technologies, Mechanics and Optics Department of Telecommunication Systems IMPLEMENTING GREEN IT APPROACH FOR TRANSFERRING BIG DATA OVER PARALLEL DATA LINK 17 of June 2015 Name : Stefanos Georgiou Supervisors: Andrey Y. Shevel Eric Rondeau

PRESENTATION GUIDELINE Introduction Goal of Thesis The importance of this work Testbed implementation Cloud infrastructure OpenStack Virtual Machines Testing Utilities Scripts and their purpose Scenarios Results Conclusion Future Work

INTRODUCTION Basic concept of the this work is to transfer Big Data over parallel data links. Big Data is large or complex amount of dataset which can not be processed by traditional processing applications. "Triple V : Velocity, Volume, and Variety. Big Data is not a small field of studies, it consists by different aspects like: store, analyze, transfer, preserve, capture, visualize, and etc.

GOAL OF THESIS (1) Assist ITMO research team with their project (to transfer Big Data by using parallel data links with SDN Openflow approach) and apply Green IT methods for the data transfer system. Main task is to compare existing data transfer applications in case to verify which results the highest data transfer speed in which occasions and explain the reasons. Essential information had to be stored and preserved for future analysis.

GOAL OF THESIS (2) Create scripts which will allow users to repeat the exact same experiments. Gather big amount of information about the experiments and all the used parameters. Prepare a complete platform which can be used in the future for different data transfer applications for further research. Testing on virtual environment and real network.

THE IMPORTANCE OF THIS WORK Data transfer applications can transfer huge amounts of datasets over long period of time, that is a reason why Green IT approach have to be taken into account. It focus on Cloud computing which is an emerge aspect in Computer Science and lot of companies are using virtual environment. Many researches perform transferring using data transfer applications (utilities) but not many details where given. (system information, dataset, and etc )

TESTBED IMPLEMENTATION (1) TestBed is a powerful platform to test our scientific theories (ITMO stor.naulinux.ru) Hardware Type Server 1 (SV) Server 2 (SM) Storage Area Network CPU 2 x Intel 6 Core Xeon E5-2640v2 @2.5GHz 2 x Intel 4 Core Xeon E5-2609v2 @2.4GHz Main Memory Hard Disk Operating (RAM) Drive System 64 GB DDR3 Scientific Linux CE 6.5 32 GB DDR3 4 x SATA Scientific Linux 500GB raid 5 CE 6.5 1000 GB HP 16 HDD SAS 450GB

TESTBED IMPLEMENTATION (2) Server on Petersburg Nuclear Physics Institute Hardware CPU Type Server 16 x Intel(R) Xeon(R) E5-2650v2 @2.6GHz Main Memory Hard Disk Operating (RAM) Drive System 99 GB DDR3 RAID6 100TB Scientific Linux CE 6.5

CLOUD INFRASTRUCTURE OpenStack version Icehouse is running on the testbed servers. Free of charge utility. User friendly GUI to manage users Virtual Machines, the Dashboard. Offers the possibility to share among user the server resource and deploy number of different Virtual Machines. Uses less hardware.

OPENSTACK VIRTUAL MACHINES Tests on Virtual Environment (tests on same server) Instance Name Location VCPUs RAM size HDD size Sender ITMO 8 16 GB 160 GB Receiver ITMO 8 16 GB 160 GB Tests on Real Network (40km distance through public network) Instance Name Location VCPUs RAM size HDD size Sender ITMO 8 16 GB 160 GB Receiver PNPI 8 16 GB 160 GB All virtual machines were tuned from Energy Science Networks website Linux Tuning

TESTING UTILITIES Fast Data Transfer (FDT) Written in Java, can be used as a server/client application. BBCP Written in C, very easy usage, no need for server or daemon process. BBFTP Written in C, based on client/server architecture, efficient for large files. GridFTP (Globus toolkit) Written in both Java and C, share users computer resources. File Transfer Service (FTS3) Written in C, C++ and bash, used as a web service, transfer scheduling.

WHY WE USED THESE UTILITIES? Common features Multi-streams transfer User can change the TCP window size Tune I/O buffer size Encrypted authentication Open-source data transfer applications

WHY USING SCRIPT? Number of different scripts were created using BASH to execute the datasets transfer using different utilities. Can be found at https://github.com/itmo-infocom/bigdata Brief description is given also how to use the scripts. Faster to executed multiple and different scenarios using the scripts instead of executing them one by one. Opportunity to capture multiple information about the transfer and the system parameters and store them for further analysis.

SCRIPTS AND THEY PURPOSES (1) Test-data: script responsible to generate a directory with binary files of random length. Reason for creating such is a script is to make data uncompressible. (fair testing for utilities) CopyData.[utility name]: scripts which initiates the dataset transfer and captures the essential information about the transfer. Creates files: Abstract Information about transfer execution time and parameters which are used Log All log information about the transfer Traceroute packets source and destination path Sosreport copy of /proc with all system information

SCRIPTS AND THEY PURPOSES (2) ExecuteMultipleCopyData.[utility name]: script with purpose to lunch different scenarios. User can lunch this script using different number of parallel streams and TCP window size. PlotGraphs.[utility name]: this script will plot graphs using gnuplot and gives output a postscript file. set-passwordless-ssh : created to set password less ssh login to the remote host.

TESTED SCENARIOS Virtual Environment (ITMO - ITMO) Dataset Size: 25GB which 244 files of 100MB each Parallel streams: 1,2,4,8,16,32, and 64 Window Size: 131, 262, 524, 1048, 2097, 4194, 8388, 16777, and 33554 Kilo Bytes Real Network (PNPI - ITMO) Dataset Size: 25GB which 244 files of 100MB each Parallel streams: 1,2,4,8,16,32, and 64 Window Size: 131, 262, 524, 1048, 2097, 4194, 8388, 16777, and 33554 Kilo Bytes

RESULTS Data located on Hard Disk Drive and mount NFS(BBCP)

RESULTS(2) Transfer from PNPI to ITMO (FDT)

RESULTS(3) Transfer from PNPI to ITMO (BBFTP)

CONCLUSION In virtual environment we can always achieve higher transfer rate because its network is software, on the other hand using real public network we are congestions and collisions. Running tests inside virtual environment before testing on real network was efficient in terms speed, and decisions could be taken before testing on real network. Transferring data from mount NFS instead of HHD helped to achieve higher transfer speed which makes our system more efficient. Using large amount of parallel streams and big window size it will only consume more resources and may not provide higher transfer speed and it will consume more energy.

FUTURE WORK Run tests with the remaining utilities Test large amounts of data Energy monitoring about is not possible for Virtual Machines. RAID storage system which will allow to parallel read/write to overcome the HHD limitation.

QUESTIONS