Development of Monitoring and Analysis Tools for the Huawei Cloud Storage



Similar documents
Improved metrics collection and correlation for the CERN cloud storage test framework

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

IMPLEMENTING GREEN IT

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

wu.cloud: Insights Gained from Operating a Private Cloud System

Shoal: IaaS Cloud Cache Publisher

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

MySQL Enterprise Monitor

Topology Aware Analytics for Elastic Cloud Services

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content

CloudFTP: A free Storage Cloud

Technical Investigation of Computational Resource Interdependencies

Scalability Factors of JMeter In Performance Testing Projects

Oracle Platform as a Service and Infrastructure as a Service Public Cloud Service Descriptions-Metered & Non-Metered.

Analyzing large flow data sets using. visualization tools. modern open-source data search and. FloCon Max Putas

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

AppDynamics Lite Performance Benchmark. For KonaKart E-commerce Server (Tomcat/JSP/Struts)

High Performance Computing in CST STUDIO SUITE

University of Edinburgh. Performance audit. Date: Niels van Klaveren Kasper van der Leeden Yvette Vermeer

Ignify ecommerce. Item Requirements Notes

Centralized Orchestration and Performance Monitoring

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

Chapter 2: Getting Started

Scaling Graphite Installations

4cast Server Specification and Installation

XpoLog Center Suite Data Sheet

User Reports. Time on System. Session Count. Detailed Reports. Summary Reports. Individual Gantt Charts

GeoCloud Project Report USGS/EROS Spatial Data Warehouse Project

Equalizer VLB Beta I. Copyright 2008 Equalizer VLB Beta I 1 Coyote Point Systems Inc.

Monitoring IBM WebSphere extreme Scale (WXS) Calls With dynatrace

owncloud Enterprise Edition on IBM Infrastructure

A Middleware Strategy to Survive Compute Peak Loads in Cloud

Parallel Computing with MATLAB

Tuning WebSphere Application Server ND 7.0. Royal Cyber Inc.

Table of Contents Introduction and System Requirements 9 Installing VMware Server 35

What is the real cost of Commercial Cloud provisioning? Thursday, 20 June 13 Lukasz Kreczko - DICE 1

MADOCA II Data Logging System Using NoSQL Database for SPring-8

Running R from Amazon's Elastic Compute Cloud

IBM Tivoli Storage Manager for Microsoft SharePoint

GeoCloud Project Report GEOSS Clearinghouse

Business white paper. HP Process Automation. Version 7.0. Server performance

Table of Contents INTRODUCTION Prerequisites... 3 Audience... 3 Report Metrics... 3

Database Scalability and Oracle 12c

preliminary experiment conducted on Amazon EC2 instance further demonstrates the fast performance of the design.

CloudCmp:Comparing Cloud Providers. Raja Abhinay Moparthi

PARALLELS CLOUD SERVER

QA PRO; TEST, MONITOR AND VISUALISE MYSQL PERFORMANCE IN JENKINS. Ramesh Sivaraman

MAGENTO HOSTING Progressive Server Performance Improvements

Archiving Microsoft Exchange Mailboxes on Hitachi Content Platform using Storage Adapter for Symantec Enterprise Vault

1 How to Monitor Performance

Benchmark Testing Results: OpenText Monitoring and Records Management Running on SQL Server 2012

XpoLog Center Suite Log Management & Analysis platform

EMS. Trap Collection Active Alarm Alarms sent by & SMS. Location, status and serial numbers of all assets can be managed and exported

How To Set Up Foglight Nms For A Proof Of Concept

Comparison of Cloud vs. Tape Backup Performance and Costs with Oracle Database

TANDBERG MANAGEMENT SUITE 10.0

VDI Without Compromise with SimpliVity OmniStack and Citrix XenDesktop

Stratusphere Solutions

SYSTEM SETUP FOR SPE PLATFORMS

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Benchmarking Cassandra on Violin

Benchmark Performance Test Results for Magento Enterprise Edition

How to manage your OpenStack Swift Cluster using Swift Metrics Sreedhar Varma Vedams Inc.

StruxureWare TM Center Expert. Data

Performance And Scalability In Oracle9i And SQL Server 2000

Hands-On Microsoft Windows Server 2008

Challenge 10 - Attack Visualization The Honeynet Project / Forensic Challenge 2011 /

Evaluation of Open Source Data Cleaning Tools: Open Refine and Data Wrangler

A Performance Analysis of Distributed Indexing using Terrier

Comparing Dynamic Disk Pools (DDP) with RAID-6 using IOR

Load Balancer Comparison: a quantitative approach. a call for researchers ;)

How To Monitor A Server With Zabbix

Performance Guideline for syslog-ng Premium Edition 5 LTS

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

SOLUTION BRIEF: SLCM R12.8 PERFORMANCE TEST RESULTS JANUARY, Submit and Approval Phase Results

Amazon EC2 XenApp Scalability Analysis

Monitoring Infrastructure for Superclusters: Experiences at MareNostrum

Performance Baseline of Oracle Exadata X2-2 HR HC. Part II: Server Performance. Benchware Performance Suite Release 8.4 (Build ) September 2013

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010

Java Bit Torrent Client

Innovative, High-Density, Massively Scalable Packet Capture and Cyber Analytics Cluster for Enterprise Customers

SAP HANA In-Memory Database Sizing Guideline

Deploying the BIG-IP LTM with the Cacti Open Source Network Monitoring System

Web Server (Step 1) Processes request and sends query to SQL server via ADO/OLEDB. Web Server (Step 2) Creates HTML page dynamically from record set

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

Analysis of VDI Storage Performance During Bootstorm

Transcription:

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage September 2014 Author: Veronia Bahaa Supervisors: Maria Arsuaga-Rios Seppo S. Heikkila CERN openlab Summer Student Report 2014

Abstract CERN is the largest research centre for particle physics research in the world. Experiments generate large amounts of data which must be stored, processed and analysed. The storage solutions should provide large data capacity, scalability and reliability. The Openlab CERN-Huawei partnership aims at testing and evaluating the performance of the Huawei Universal Distributed Storage system. Benchmarking and monitoring tools are developed to investigate the storage system behaviour. This report describes the upgrades done for the monitoring and analysis tools used by the Huawei cloud storage system at CERN. The first part of the report describes the additions done to the monitoring system of the cloud storage. The rest of the report describes the improvements done for scripts for listing files on the storage and monitoring their size.

Table of Contents 1 Introduction... 4 2 General Monitoring System for the Huawei Cloud Storage... 5 2.1 Log files retrieval... 5 2.2 Log files analysis... 6 2.3 Plotting events... 7 3 Parallelization of File-listing Script... 8 3.1 Parallelization using one thread per user account... 8 3.2 Parallelization using pool of processes... 8 4 Size Monitoring... 10 5 Conclusions... 10 6 References... 11

1 Introduction The Huawei Storage system, placed at CERN s computing centre, consists of two UDS (Universal Distributed Storage) systems [1]. The first UDS (Figure 1) has 768 TB of storage space divided over 384 storage nodes and controlled by seven controller nodes. The second UDS (Figure 2) is a newer generation with newer software. It has 1200 TB of storage space divided over 300 storage nodes and controlled by four controller nodes. Both systems make use of the S3 (Simple Storage Service) protocol. S3 provides an API for making requests to the system by using HTTP methods such as GET, PUT or LIST. The log analysis and monitoring is carried out for both systems. Log files are continuously generated by the storage systems in order to understand the system behaviour. Figure 1. First Huawei UDS Figure 2. Second Huawei UDS 4 P a g e

2 General Monitoring System for the Huawei Cloud Storage The aim of the monitoring system is to understand the behaviour of the storage systems and to identify events, such as file operations and error messages generated by the storage software, and when they happen. This is done by retrieving log files from the storage then parsing and analysing them. Figure 3 shows the steps followed in the monitoring process and the programming language used in each step. Retrieving Logs Parsing and Analysing Plotting Events Bash Python Python Figure 3. Monitoring process steps The storage monitoring that previously existed enabled log files downloading and parsing for the first Huawei UDS. The new feature that was added was making the monitoring system a general one for both Huawei storages. Parameters that are specific for each UDS, such as IP addresses of front-end nodes, available file operations, etc., are read from configuration files created per UDS. Thus, the monitoring system can be used for both storages. 2.1 Log files retrieval Log files contain entries of events, such as file operations, errors or status messages, which the storage software has produced together with their timestamps. Two types of log files are retrieved from the storage system; access logs and java logs. Access logs contain information about file operations that have been performed inside the storage system, such as GET or DELETE a file. Figure 4 shows an example of an access log entry. Java logs contain entries from the storage system software. Any kind of error or event that is logged can be found in these files. 5 P a g e Figure 4. An example of an access log entry for second Huawei UDS

There are two options for retrieving logs from the storage: either get the most recent logs only or get historic logs too. Retrieval is done using secure copy from the remote host to the local host while preserving all original file attributes, such as time it was modified and accessed. The existing script enabled retrieving log files from the first UDS. The script now is general and enables retrieving logs from either storage system. 2.2 Log files analysis Analysing logs is an important part of understanding the behaviour of the storage system. Log files are parsed to display a readable summary of the events occurring in a given interval of time. This is done for both types of log files. Events are extracted from the log files, together with the number of times each event occurred. If the requested range to be analysed is longer than the contents of the most recent log file, older log files are appended as needed. An example of the summary resulting from parsing access logs of the first UDS is shown in Figure 5. The summary shows the number of times every event occurred for every front-end node. Figure 5. First UDS analyzing access logs summary 6 P a g e

Speed (events every 10 seconds) CERN openlab Summer Student Report 2014 2.3 Plotting events Data from log files can be visualized by a previously developed plotting tool [2]. A graph showing the distribution of a chosen file operation during the given interval based on log file entries can be drawn. The graph can be plotted separately for each frontend node or as a sum of all nodes. Figure 6 shows the plot of the same data that was analysed in Figure 5. 7 front-end nodes Timestamps Figure 6. Plotting a REST.HEAD.OBJECT event for the first UDS 7 P a g e

3 Parallelization of File-listing Script The file-listing script is used to list files per user account stored on the Huawei systems. The existing script was slow as it used one thread. The improvement made was to increase the speed by using several threads. Two different implementations were tried to get the best speedup. 3.1 Parallelization using one thread per user account The first approach to the parallelization of the file-listing script is to use one thread per user account so that all accounts are analysed in parallel. Each thread writes its account data in a file. When all threads are done, all files are concatenated to one file as shown in Figure 7. 3.2 Parallelization using pool of processes The second approach to parallelization and speed improvement is to define a pool of worker processes to list files per bucket. Each thread writes its bucket data in a file. There are usually many more buckets than processes. As soon as a process completes its listing for a bucket, it is assigned another one until all buckets have been completed. The processes then terminate and all files are concatenated as shown in Figure 8. Accounts Total buckets Acc[0] Acc[1] Acc[2] Acc[3] Process pool Write in file Write in file Write in file Write in file Concatenate all files Write bucket data in file. Thread is assigned new bucket Completed getting bucket data Figure 7. Parallelization first approach Figure 8. Parallelization second approach The number of worker processes can be controlled in order to get the highest speedup. Figure 9 shows the execution times corresponding to different pool sizes. The script was tested for 21 user accounts with a hundred buckets each. The total number of files was 3757369 files. The tests were run on a server equipped with 48GB of RAM memory and 24 Intel Xeon 2.27 GHz cores. 8 P a g e

Time (seconds) CERN openlab Summer Student Report 2014 Figure 9. Execution time vs. number of processes Figure 10 shows a comparison for the execution times of the original script and the two parallelization approaches. In the second approach the thread pool was created with 21 processes. The same previous accounts were used for testing. The second parallelization approach gives the highest speedup; the average speedup is 6.5 for 21 processes. 300 250 200 150 100 50 0 Original Approach 1 Approach 2 File-listing scripts Figure10. Execution times for original file-listing and the two parallelization approaches 9 P a g e

Time (seconds) CERN openlab Summer Student Report 2014 4 Size Monitoring It is sometime necessary not only to list the files and their number but also to calculate the total size of the files stored in the cloud storage. Given the high speedup of the parallelised file-listing script, calculation of size of files is added to the parallel script as a new feature. The existing code for calculating size also used one thread. Figure 11 shows a comparison for the execution times of the original script for size calculation and the parallelised file-listing script with the added size calculation. The size calculation was added to the second parallelization approach (section 3.2). The same accounts used in the previous chapter were also used for testing. 300 250 200 150 100 50 0 Original Size calculation scripts Parallel Approach Figure 11. Execution times for original size calculation script and the parallelized script 5 Conclusions The analysis and monitoring system of CERN Openlab Huawei cloud storage has been improved to provide a general monitoring system, which allows log file analysis and event visualization for both CERN Openlab Huawei cloud storage generations. Moreover, a parallelisation technique has been applied to the file-listing script in order to speed it up, by achieving a 6.5 factor of improvement. A new feature has been included to retrieve the total size of the files already stored in both cloud storages. In the future, a storage node analysis for collecting CPU and disk metrics would be interesting to complete the current analysis system. 10 P a g e

6 References [1] Zotes Resines M., Heikkilä S.S., Duellmann D., Adde G., Toebbicke R., Hughes J. & Wang L. "Evaluation of the Huawei UDS cloud storage system for CERN specific data", Journal of Physics: Conference Series Vol. 513(4), 2014 [2] Lindqvist C., Improved Metrics Collection and Correlation for the CERN Cloud Storage Test Framework, CERN Openlab summer student report, 2013. 11 P a g e