CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015



Similar documents
Project 4: IP over DNS Due: 11:59 PM, Dec 14, 2015

HPC performance applications on Virtual Clusters

Project 2: Firewall Design (Phase I)

1. Simulation of load balancing in a cloud computing environment using OMNET

ServerPronto Cloud User Guide

CS 557- Project 1 A P2P File Sharing Network

Resource Utilization of Middleware Components in Embedded Systems

From Centralization to Distribution: A Comparison of File Sharing Protocols

Optimizing Shared Resource Contention in HPC Clusters

Novel Systems. Extensible Networks

Confinement Problem. The confinement problem Isolating entities. Example Problem. Server balances bank accounts for clients Server security issues:

Self-Adapting Load Balancing for DNS

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Directions for VMware Ready Testing for Application Software

Fast Analytics on Big Data with H20

Capacity Estimation for Linux Workloads

Benchmarking the Performance of XenDesktop Virtual DeskTop Infrastructure (VDI) Platform

Performance of Host Identity Protocol on Nokia Internet Tablet

Low-rate TCP-targeted Denial of Service Attack Defense

A Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems

Programmable Networking with Open vswitch

Process Scheduling CS 241. February 24, Copyright University of Illinois CS 241 Staff

A Scheme for Implementing Load Balancing of Web Server

Advanced Computer Networks Project 2: File Transfer Application

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Product Version 1.0 Document Version 1.0-B

Performance Tuning Guide for ECM 2.0

VIRTUALIZATION AND CPU WAIT TIMES IN A LINUX GUEST ENVIRONMENT

Performance of VMware vcenter (VC) Operations in a ROBO Environment TECHNICAL WHITE PAPER

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Storage I/O Control: Proportional Allocation of Shared Storage Resources

Cloud Based E-Learning Platform Using Dynamic Chunk Size

Multifaceted Resource Management for Dealing with Heterogeneous Workloads in Virtualized Data Centers

High Level Design Distributed Network Traffic Controller

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

DoS: Attack and Defense

Virtual Private Systems for FreeBSD

Dynamic Resource Allocation in Software Defined and Virtual Networks: A Comparative Analysis

Application Layer. CMPT Application Layer 1. Required Reading: Chapter 2 of the text book. Outline of Chapter 2

Improving the performance of data servers on multicore architectures. Fabien Gaud

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

Java Bit Torrent Client

OpenMosix Presented by Dr. Moshe Bar and MAASK [01]

Bringing Value to the Organization with Performance Testing

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

@IJMTER-2015, All rights Reserved 355

Task Scheduling for Efficient Resource Utilization in Cloud

Dynamic Load Balancing of Virtual Machines using QEMU-KVM

CSC 2405: Computer Systems II

Cloud Operating Systems for Servers

Dynamic Resource allocation in Cloud

CSCI E 98: Managed Environments for the Execution of Programs

Parametric Analysis of Mobile Cloud Computing using Simulation Modeling

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

An Analysis of Propalms TSE and Microsoft Remote Desktop Services

Network Performance Evaluation of Latest Windows Operating Systems

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

Performance Optimization For Operational Risk Management Application On Azure Platform

CHAPTER 1 INTRODUCTION

Study of Various Load Balancing Techniques in Cloud Environment- A Review

CS 378: Computer Game Technology

Enterprise Solution for Remote Desktop Services System Administration Server Management Server Management (Continued)...

Muse Server Sizing. 18 June Document Version Muse

11.1 inspectit inspectit

Network traffic: Scaling

STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM

CSE LOVELY PROFESSIONAL UNIVERSITY

KVM: A Hypervisor for All Seasons. Avi Kivity avi@qumranet.com

Multi-core Curriculum Development at Georgia Tech: Experience and Future Steps

Key Components of WAN Optimization Controller Functionality

Windows Server 2008 R2 Hyper-V Live Migration

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

What s Cool in the SAP JVM (CON3243)

Enterprise Backup and Restore technology and solutions

Network performance in virtual infrastructures

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

E) Modeling Insights: Patterns and Anti-patterns

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

A NOVEL LOAD BALANCING STRATEGY FOR EFFECTIVE UTILIZATION OF VIRTUAL MACHINES IN CLOUD

Web Development. Owen Sacco. ICS2205/ICS2230 Web Intelligence

Windows Server 2008 R2 Hyper-V Live Migration

Dynamic request management algorithms for Web-based services in Cloud computing

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

Towards Elastic Application Model for Augmenting Computing Capabilities of Mobile Platforms. Mobilware 2010

CS312 Solutions #6. March 13, 2015

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

Affinity Aware VM Colocation Mechanism for Cloud

Large-scale performance monitoring framework for cloud monitoring. Live Trace Reading and Processing

Technical Investigation of Computational Resource Interdependencies

How To Test For Performance

Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run

RevoScaleR Speed and Scalability

How To Install An Aneka Cloud On A Windows 7 Computer (For Free)

PROPALMS TSE 6.0 March 2008

Bellwether metrics for diagnosing

Windows 2003 Performance Monitor. System Monitor. Adding a counter

Transcription:

CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 1. Goals and Overview 1. In this MP you will design a Dynamic Load Balancer architecture for a Distributed System 2. You will learn and implement system monitors, adaptation and load balancing schemes to improve the system utilization and performance of a distributed application under limited bandwidth, memory and transient CPU load constraints. 2. Development Setup For this MP you will develop a Dynamic Load Balancer for Linux. You will work entirely in the Linux userspace using Virtual Machines. The development setup will simulate a distributed infrastructure in which a local small factor device (e.g. cellphone or tablet) delegates some of its processing load to a remote higher-end computer. For this MP you can choose between Java, Python and C/C++. You will use the provided VM as the remote higher-end computer (i.e. Remote Node) and also you will be provided with an additional Virtual Machine for your group to use as the local small factor device (i.e. Local Node). Your group s local node is named as sp15-cs423-gxx-s.cs.illinois.edu, where xx is your group number. Say you are group 1, your local node will be sp15-cs423-g01-s.cs.illinois.edu. You can turn on/off both VMs using the vsphere web client. Two users have been created for you: A regular user cs423 with sudo access (password: tallbutter85), and a TA-only user cs423ta. Please log in your small VM and change the password as soon as possible. DO NOT change the cs423ta password. Finally, you are encouraged to discuss design ideas and bugs in Piazza. Piazza is a great tool for collective learning. However, please refrain from posting large amounts of code. Two or three lines of code is fine. High-level pseudo-code is also fine. 3. Introduction The availability of low-cost microprocessor and the increased bandwidth of network communications have favored the development of distributed systems. However, one common problem of these systems is that in many cases some nodes remain underutilized while other nodes remain highly utilized. Maintaining even utilization in these systems is useful to increase system efficiency, performance and reduce power consumption. One common technique to achieve even utilization across nodes, is the use of dynamic load balancing techniques that actively transfer the load from highly utilized nodes to lower utilization nodes.

This type of algorithm usually has 4 main components: 1) Transfer policy: Determines when a node should initiate a load transfer 2) Selection policy: Determines which jobs will be transferred. 3) Location policy: Determines which node is a suitable partner to transfer the load to or from. Please note that for our MP this policy is trivial, because there are only two nodes in the system. 4) State Information policy: Determines how often the state information about each node should be shared with other nodes. 4. Problem Description In this MP, you will develop an architecture that implements a dynamic load balance algorithm. You will decide a suitable transfer, selection, location and information policies, evaluate and justify your decisions. You will be required to implement a minimal set of components and functionality and you will have the opportunity to further augment and improve your design. You are encouraged to evaluate the performance impact of your decisions and to present your results as part of your demonstration. The system architecture is composed of 2 nodes: a limited CPU Mobile node and a high-end Server node. Both of these two nodes work together to compute a vector addition task. You need to malloc a double vectors A with 1024 * 1024 * 32 elements. For vector A, initialize all elements with 1.111111. For each element of vector A, your task is to do the following computation (DO NOT optimize the following computation): for( int j=0; j < 1000; j++){ A[i] += 1.111111; } Note that the workload here is fully parallelizable; Your system should divide the workload in smaller chunks, called jobs. Each job is independent of each other and each job should roughly take a similar time to complete. It s up to you to decide the size of the jobs during the implementation. However, initially there should be at least 512 jobs. During the transfer of the jobs, you MUST transfer the real data within vector A. You cannot simply tell the other node that please compute vector addition from, say, index 1000 to 2000. You must transfer A[1000-2000] to the other node to finish the above computation. Our system will have three phases: 1) Bootstrap Phase: Originally, the Local Node will contain all the workload and it must transfer half of the workload to the Remote Node. This will be called the bootstrap process. You must print out enough information (say, the job ID) on screen so that we can tell you are indeed transferring the jobs. 2) Processing Phase: After the bootstrap phase, each node divides the half workload into chunks of data called jobs; each node will have a queue with roughly the same number of jobs. At this

point, each node will start processing each of these jobs one by one and storing the result locally. Our dynamic load balancer will only work during this processing phase (i.e. by applying the transfer, selection, location and state information policies). 3) Aggregation Phase: After all the jobs are successfully processed, our system must aggregate the result into a single node (either local or remote) and display the result. 5. Implementation Overview In this section, we present the minimal architecture that you must implement. Below you will find a figure showing the components of this architecture: Job Queue: It is responsible of storing the pending jobs and dispatching them to the Work Thread. Worker Thread: The Worker Thread is responsible of processing each job and locally storing the result. The Worker Thread must react to Throttling. This means, the worker thread must limit its utilization to a percentage value provided by the Adaptor/User. For example, a throttling value of 70% means that during each 100 milliseconds, the Worker Thread must be sleeping for 30 milliseconds and processing the job for 70 milliseconds. During the sleeping period, the worker thread must not spin on a variable. You must use adequate system mechanisms to put the thread to sleep. Hardware Monitor: The Hardware monitor is responsible for collecting information about the hardware. You are required to collect information about the CPU utilization of the system.

However, as noted in the next Section you can also collect Network Bandwidth and other system information used by any of the policies of the Load Balancer (Transfer Policy, Information policy and Selection policy). As part of the hardware monitor, you must implement an interface that allows the user to dynamically specify a Throttling value to use to limit the Worker Thread during the execution. During the demo, we ll try different throttling values (like 0.1, 0.25, 0.5, 0.75) to limit the usage of either the local node or the remote node (or both). Transfer Manager: The Transfer Manager is responsible of performing a load transfer upon request of the adaptor. It must move jobs from the Job Queue and send them to remote node. It must also receive any jobs sent by the remote node and place them in the local Job Queue. You can use any protocol that you choose (e.g. TCP, UDP, HTTP, etc.). You MUST print out enough info on screen (like job ID) when transferring and receiving jobs, so that during the demo we can tell that the jobs are indeed being transferred between the two nodes. State Manager: The State Manager is responsible of transferring system state to the remote node, including, the number of jobs pending in the queue, current local throttling values, and all the information collected by the Hardware Monitor. For this component, you will need to choose an Information Policy. This policy will dictate how often the collected information is exchanged including time intervals or events. Careful design of this policy is important due to the performance, stability and overhead tradeoffs. Adaptor: The Adaptor is responsible for applying the Transfer and Selection policies. It must use information provided by the State Manager, and the Hardware Monitor and decide whether a load balance transfer must occur. Please note that the most basic policy that you can implement is to consider the Queue Length or the Estimated Completion Time (based on the throttling and the queue length). A more sophisticated policy should use the Queue Length or the Estimated Completion Time and also other factors. As part of the implementation of your transfer policy, you must choose between sender-initiated transfers, receiver-initiated transfers or symmetric initiated transfers. Each of these have different tradeoffs in terms of performance, overhead and stability. 6. Further Improvements In this MP, you will be awarded extra points for further improving your code over the basic architecture. There are several issues that have room for improvement, and also, several design decisions that must be validated. In this section, we discuss some of these ideas: You can analyze the impact of choosing between sender-initiated transfers, receiver-initiated transfers or symmetric initiated transfers. You should show graphs and highlight the tradeoffs in your final report (5 pts). You can extend the Hardware Monitor to consider Bandwidth and Delay. Assume the communication between the Local Node and the Remote Node is limited to 10 Mbit/s with average delay of 54ms (Std. Dev. 32ms) and average packet loss of 0.2%. Use this information as part of your Transfer Policy. Limited bandwidth can make load balance a very costly operation and further degrade the performance. (5pts)

You can compress the data when sending it over the internet to save time and bandwidth. (10 pts) You can implement a Graphical User Interface that shows the progress of your job and also clearly shows when your system initiates a transfer and changes a throttling value. (15 pts) You can increase the number of Worker Threads to exploit the concurrency of the system ( 5 pts) 7. Software Engineering As in previous MPs, your code should include comments where appropriate. It is not a good idea to repeat what the function does using pseudo-code, but instead, provide a high-level overview of the function including any preconditions and post-conditions of the algorithm. Some functions might have as few as one line comments, while some others might have a longer paragraph. Also, your code must follow good programming practices, including small functions and good naming conventions. For this MP you should choose adequate data structures to use. C++ and Java provide a large number of data structures with various performance characteristics. During the interview, you should justify your data structure design decisions. 8. Hand-In All MPs will be collected using compass2g. You need to submit a single group_id_mp4.zip (replace ID with your group number. If the filename is incorrect, -10 pts) file through compass2g. Only one of the group members need to do the submission. You need to provide all your source code (*.c and *.h, including the given source code) and a Makefile that compiles both your kernel. Do not include binaries or other automatically generated files. Finally, you must write a document briefly describing your implementation and design decisions. You must include all the analysis that influenced your design decisions including any experimental evaluation that you might have performed. In addition to the submission, you should be able to demo your source code and answer questions during an interview on Monday with the graders. This interview will be a major part of the grading. All the members of the group must be present. This interview will be a major part of the grading. Instructions for how to sign-up for the slot interviews will be posted at a later time on Piazza.

9. Grading Criteria In this MP, you will be awarded extra credit for implementing additional features for your code. However, you are required to implement the minimal set of features described by the proposed architecture in this Handout. Criterion Points Bootstrap Phase 10 Aggregation Phase 10 Adaptor including Selection and Transfer Policy 15 Transfer Manager 10 State Manager 10 Hardware Monitor 10 Job Queue 10 Efficient use of Locks Synchronization Primitives and Conditional Variables 5 Code Compiles Successfully 5 Code Follows Good Software Engineering Principles 5 Documentation 10 Bonus Points - Further Improvements Up to 40 TOTAL 140 / 100 10. References [1] N.G. Shivaratri, P. Krueger, and M. Singhal, "Load Distributing for Locally Distributed Systems", presented at IEEE Computer, 1992, pp.33-44.