CHAPTER 1 INTRODUCTION



Similar documents
Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

Multi-core and Linux* Kernel

Optimizing Shared Resource Contention in HPC Clusters

Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run

Road Map. Scheduling. Types of Scheduling. Scheduling. CPU Scheduling. Job Scheduling. Dickinson College Computer Science 354 Spring 2010.

Job Scheduling Model

CPU Scheduling Outline

Multi-core Programming System Overview

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun

Task Scheduling for Multicore Embedded Devices

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Control 2004, University of Bath, UK, September 2004

Chapter 5 Process Scheduling

SCALABILITY AND AVAILABILITY

2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Introduction to Cloud Computing

Chapter 1: Introduction. What is an Operating System?

Binary search tree with SIMD bandwidth optimization using SSE

Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

OPERATING SYSTEMS SCHEDULING

Application Performance Analysis of the Cortex-A9 MPCore

FPGA-based Multithreading for In-Memory Hash Joins

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Driving force. What future software needs. Potential research topics

An Adaptive Task-Core Ratio Load Balancing Strategy for Multi-core Processors

CPU Scheduling. Core Definitions

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

Multilevel Load Balancing in NUMA Computers

Symmetric Multiprocessing

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Cellular Computing on a Linux Cluster

CPU SCHEDULING (CONT D) NESTED SCHEDULING FUNCTIONS

Virtualizing Performance Asymmetric Multi-core Systems

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Parallel Programming Survey

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Optimization of Cluster Web Server Scheduling from Site Access Statistics

Garbage Collection in the Java HotSpot Virtual Machine

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

Chapter 2: OS Overview

Group Based Load Balancing Algorithm in Cloud Computing Virtualization

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

Real-Time Operating Systems for MPSoCs

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

Advanced Core Operating System (ACOS): Experience the Performance

Asymmetric Scheduling and Load Balancing for Real-Time on Linux SMP

Cray: Enabling Real-Time Discovery in Big Data

W4118 Operating Systems. Instructor: Junfeng Yang

Resource Utilization of Middleware Components in Embedded Systems

Cloud Computing and Robotics for Disaster Management

Operating Systems 4 th Class

TPCalc : a throughput calculator for computer architecture studies

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

LinuxWorld Conference & Expo Server Farms and XML Web Services

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

HyperThreading Support in VMware ESX Server 2.1

Benchmarking Hadoop & HBase on Violin

Distributed Systems LEEC (2005/06 2º Sem.)

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM

Operating System Tutorial

Contributions to Gang Scheduling

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

CHAPTER 1 INTRODUCTION

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Next Generation GPU Architecture Code-named Fermi

BSPCloud: A Hybrid Programming Library for Cloud Computing *

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles

Linux Scheduler Analysis and Tuning for Parallel Processing on the Raspberry PI Platform. Ed Spetka Mike Kohler

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

How To Understand And Understand An Operating System In C Programming

CHAPTER 4 PERFORMANCE ANALYSIS OF CDN IN ACADEMICS

Testing Database Performance with HelperCore on Multi-Core Processors

Course Development of Programming for General-Purpose Multicore Processors

Final Report. Cluster Scheduling. Submitted by: Priti Lohani

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

Process Scheduling CS 241. February 24, Copyright University of Illinois CS 241 Staff

A Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems

Four Keys to Successful Multicore Optimization for Machine Vision. White Paper

Principles and characteristics of distributed systems and environments

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Operating System Impact on SMT Architecture

HPC performance applications on Virtual Clusters

An examination of the dual-core capability of the new HP xw4300 Workstation

Effective Computing with SMP Linux

Transcription:

1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses. These multicore processors are also termed as Chip Multi Processors (CMP). Depending on the design complexity of cores and chip, these can be classified as homogenous multicore in which all cores are identical in all respects. The other is Heterogeneous multicore in which cores have different execution capabilities but having same ISA (Instruction Set Architecture). In hybrid multicore, all cores have different ISA and execution capabilities. multicore processors are designed to increase efficiency by increasing multitasking, parallelism and throughput. The nature of challenge in CMPs is different than that in case of multiple processors (SMPs) in many ways. Like cores in CMPs are more closely coupled than that of processors in SMPs. L2 and L3 cache which are shared by the multiple cores with in a chip whereas, in SMPs no cache at any level is being shared by the processors. This leads to more complex cache and memory hierarchy design in CMPs than SMPs. Also, Scalability is another challenge from architectural point of view as number of processors in general SMPs are often limited to four or eight, where in CMPs designers are thinking to place hundreds or even thousands of cores in a single chip. Similarly, from software design aspect CMPs also have different challenges than those in SMPs. These includes, program or thread scheduling and better load distribution on the available

2 cores, level of parallelism as CMPs favor thread level parallelism whereas SMPs work better for process or application level parallelism. Some other software challenges may include, design of threads, algorithm decomposition techniques, programming patterns, operating system support etc. Hardware and Software Challenges The shift towards multicore architectures causes several challenges for computer architects. Due to a big change in technology, from micrometer to nanometer, there is a significant increase of the number of cores on a chip. Now it is computer designer s responsibility to determine a computational structure that can transform the increase in cores into a corresponding increase in computational performance efficiency. This challenge must be dealt with on several fronts, like basic architecture of each processor (core) to increase single or multithread performance, the architecture of the memory system and a holistic approach to support to emerging programming models for multicore processors. Software development is also a major challenge for multicore programmers. The software that runs on the multicore processor must have capability of exploiting maximum parallelism and concurrency, efficient scheduling and good load distribution. Although much progress has been made on these problems but still, much remains to be donethe goal of parallel processing is to have the running time of an application reduced by a factor that is inversely proportional to the number of processors or cores used. One way to define the speedup is the ratio of the running time on a single processor to the running time on parallel processors machine. This type of scalability only depends on the architecture not on the application. Sometimes the application is limited and further addition of processors or cores may even degrade the performance. According to this concept, an application is said to be scalable if the number of processors and the problem size are increased by

3 a factor then the running time should remain unchanged. An efficient scheduling has to be designed to increase the parallelism on multicore processors. Load balancing is another issue that strongly affects the performance of a system. It means that the processors have nearly the same amount of program code to be executed. In order to balance the computational load on a multicore machine, the programmer must divide the computations and communications on all available cores uniformly. 1.2 OVERVIEW OF THE PROPOSED WORK In the first proposed method, the AMAS theory of multiagent system is combined with the scheduler of operating system to develop a new agent based scheduling algorithm for multicore architecture. This multiagent based scheduling algorithm promises in minimizing the average waiting time of the processes in the centralized queue, reduces the task of the scheduler and also increases CPU performance. In the second proposed research method, hard-soft processor affinity scheduling algorithm is implemented which promises in minimizing the average waiting time of the non critical tasks in the centralized queue and avoids the context switching of critical tasks. This is achieved by assigning the hard affinity for critical tasks and the soft affinity for non critical tasks so that the context switched critical tasks can be assigned to the same original core where it was previously assigned. The entire organization is depicted in Figure 1.1. In the third method, a novel agent based scheduling and thread assignment algorithm is proposed in such a way that none of the heterogeneous processor will be kept in the idle state and the cores are utilized efficiently. The processors are actually classified as fast core, average core and slow core based on their computing power. Then based on the CPU and memory intensive instructions it is assigned that the threads to the respective cores. The ultimate aim is the heterogeneous processors within the multicore are assigned with the appropriate threads.

4 Figure 1.1 Proposed Methodologies incorporated in thesis In the second phase of the research simple load balancing algorithm is proposed which is a direct derivation and solution obtained from the defined agent scheduling algorithms. Because of the basic round robin scheduling utilized along with the intelligent agents, the power consumption for each processor can be equalized and thus leading to the automatic load balancing among the processors. Apart from the scheduling and load balancing, a small amount of implementation of agent based storage compaction algorithm also proposed in this research work. The evaluation results show that this agent scheduling and load balancing algorithms outperforms the existing algorithms for HMC (Heterogeneous Multicore) processors as well as symmetric multicore processors with respect to CPU utilization.

5 1.3 RESEARCH OBJECTIVES The chief objective of this research is to develop an approach that is capable of scheduling processes based on the simple multiagents, and schedule large number of independent and indivisible jobs on multicore platform. This scheduling automatically balances the load on many cores thus leading to improved throughput. To achieve the said objective, it is proposed to carry out the following: Design of novel agent based Scheduling algorithm using linux kernel. Performance evaluation of agent based scheduling algorithm after the selection of the cores and SPEC defined benchmark processes. A new load balancing mechanism is proposed and the performance is evaluated based on several factors. A new time based agent storage compaction algorithm also proposed which efficiently uses memory. A novel task allocation mechanism is proposed based on the core speed for Heterogeneous Multi Core System (HMC). 1.4 CONTRIBUTION OF THE THESIS The research has argued that multicore processors pose unique scheduling problems that require a multiagent based software approach that utilizes the large number processors very effectively. The work of dispatcher is actually eliminated with the help of processor agents itself. Each processor scheduling will be similar to the self scheduling employed in the traditional multiprocessor system. This is possible only with the help of processor agents assigned for every processor. It is also proved that lot of drastic enhancements

6 in the traditional scheduler that optimizes for CPU cycle utilization. It is discovered that the average waiting time decreases slowly with the increase of the number of cores. As a conclusion the new novel approach eliminates the complexity of the hardware and improved the CPU utilization to the maximum level. In the affinity based scheduling, the CPU utilization is actually maximum for the critical tasks and ideal processors are utilized well in the case of non critical tasks. Even though there is a cost of migrating the non critical tasks to some other processor efficient and maximum utilization of the CPU is the primary concern. 1.5 THESIS OUTLINE The thesis is organized as follows; First chapter presents the introduction to multicore architecture, intelligent agents and the limitations of conventional methods of scheduling algorithms and load balancing algorithms. Second chapter describes the literature survey of scheduling and load balancing algorithm. Subsequently, from the inference of literature survey, the objectives of the thesis are presented. Third chapter explains the novel agent based scheduling algorithm. The average waiting time for the proposed algorithm is implemented on the modified Linux 2.6.11 kernel process scheduler. Fourth chapter describes novel hard soft affinity processor scheduling and load balancing using agents. Fifth chapter presents core performance based agent scheduling and thread assignment for heterogeneous multicore system. Sixth chapter explains the binary search tree based load balancing algorithm and equalizing power consumption for processors using load balancing and automatic load balancing using time based unused space collection.. Seventh chapter reports valuable conclusions that are drawn from the research work.