1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses. These multicore processors are also termed as Chip Multi Processors (CMP). Depending on the design complexity of cores and chip, these can be classified as homogenous multicore in which all cores are identical in all respects. The other is Heterogeneous multicore in which cores have different execution capabilities but having same ISA (Instruction Set Architecture). In hybrid multicore, all cores have different ISA and execution capabilities. multicore processors are designed to increase efficiency by increasing multitasking, parallelism and throughput. The nature of challenge in CMPs is different than that in case of multiple processors (SMPs) in many ways. Like cores in CMPs are more closely coupled than that of processors in SMPs. L2 and L3 cache which are shared by the multiple cores with in a chip whereas, in SMPs no cache at any level is being shared by the processors. This leads to more complex cache and memory hierarchy design in CMPs than SMPs. Also, Scalability is another challenge from architectural point of view as number of processors in general SMPs are often limited to four or eight, where in CMPs designers are thinking to place hundreds or even thousands of cores in a single chip. Similarly, from software design aspect CMPs also have different challenges than those in SMPs. These includes, program or thread scheduling and better load distribution on the available
2 cores, level of parallelism as CMPs favor thread level parallelism whereas SMPs work better for process or application level parallelism. Some other software challenges may include, design of threads, algorithm decomposition techniques, programming patterns, operating system support etc. Hardware and Software Challenges The shift towards multicore architectures causes several challenges for computer architects. Due to a big change in technology, from micrometer to nanometer, there is a significant increase of the number of cores on a chip. Now it is computer designer s responsibility to determine a computational structure that can transform the increase in cores into a corresponding increase in computational performance efficiency. This challenge must be dealt with on several fronts, like basic architecture of each processor (core) to increase single or multithread performance, the architecture of the memory system and a holistic approach to support to emerging programming models for multicore processors. Software development is also a major challenge for multicore programmers. The software that runs on the multicore processor must have capability of exploiting maximum parallelism and concurrency, efficient scheduling and good load distribution. Although much progress has been made on these problems but still, much remains to be donethe goal of parallel processing is to have the running time of an application reduced by a factor that is inversely proportional to the number of processors or cores used. One way to define the speedup is the ratio of the running time on a single processor to the running time on parallel processors machine. This type of scalability only depends on the architecture not on the application. Sometimes the application is limited and further addition of processors or cores may even degrade the performance. According to this concept, an application is said to be scalable if the number of processors and the problem size are increased by
3 a factor then the running time should remain unchanged. An efficient scheduling has to be designed to increase the parallelism on multicore processors. Load balancing is another issue that strongly affects the performance of a system. It means that the processors have nearly the same amount of program code to be executed. In order to balance the computational load on a multicore machine, the programmer must divide the computations and communications on all available cores uniformly. 1.2 OVERVIEW OF THE PROPOSED WORK In the first proposed method, the AMAS theory of multiagent system is combined with the scheduler of operating system to develop a new agent based scheduling algorithm for multicore architecture. This multiagent based scheduling algorithm promises in minimizing the average waiting time of the processes in the centralized queue, reduces the task of the scheduler and also increases CPU performance. In the second proposed research method, hard-soft processor affinity scheduling algorithm is implemented which promises in minimizing the average waiting time of the non critical tasks in the centralized queue and avoids the context switching of critical tasks. This is achieved by assigning the hard affinity for critical tasks and the soft affinity for non critical tasks so that the context switched critical tasks can be assigned to the same original core where it was previously assigned. The entire organization is depicted in Figure 1.1. In the third method, a novel agent based scheduling and thread assignment algorithm is proposed in such a way that none of the heterogeneous processor will be kept in the idle state and the cores are utilized efficiently. The processors are actually classified as fast core, average core and slow core based on their computing power. Then based on the CPU and memory intensive instructions it is assigned that the threads to the respective cores. The ultimate aim is the heterogeneous processors within the multicore are assigned with the appropriate threads.
4 Figure 1.1 Proposed Methodologies incorporated in thesis In the second phase of the research simple load balancing algorithm is proposed which is a direct derivation and solution obtained from the defined agent scheduling algorithms. Because of the basic round robin scheduling utilized along with the intelligent agents, the power consumption for each processor can be equalized and thus leading to the automatic load balancing among the processors. Apart from the scheduling and load balancing, a small amount of implementation of agent based storage compaction algorithm also proposed in this research work. The evaluation results show that this agent scheduling and load balancing algorithms outperforms the existing algorithms for HMC (Heterogeneous Multicore) processors as well as symmetric multicore processors with respect to CPU utilization.
5 1.3 RESEARCH OBJECTIVES The chief objective of this research is to develop an approach that is capable of scheduling processes based on the simple multiagents, and schedule large number of independent and indivisible jobs on multicore platform. This scheduling automatically balances the load on many cores thus leading to improved throughput. To achieve the said objective, it is proposed to carry out the following: Design of novel agent based Scheduling algorithm using linux kernel. Performance evaluation of agent based scheduling algorithm after the selection of the cores and SPEC defined benchmark processes. A new load balancing mechanism is proposed and the performance is evaluated based on several factors. A new time based agent storage compaction algorithm also proposed which efficiently uses memory. A novel task allocation mechanism is proposed based on the core speed for Heterogeneous Multi Core System (HMC). 1.4 CONTRIBUTION OF THE THESIS The research has argued that multicore processors pose unique scheduling problems that require a multiagent based software approach that utilizes the large number processors very effectively. The work of dispatcher is actually eliminated with the help of processor agents itself. Each processor scheduling will be similar to the self scheduling employed in the traditional multiprocessor system. This is possible only with the help of processor agents assigned for every processor. It is also proved that lot of drastic enhancements
6 in the traditional scheduler that optimizes for CPU cycle utilization. It is discovered that the average waiting time decreases slowly with the increase of the number of cores. As a conclusion the new novel approach eliminates the complexity of the hardware and improved the CPU utilization to the maximum level. In the affinity based scheduling, the CPU utilization is actually maximum for the critical tasks and ideal processors are utilized well in the case of non critical tasks. Even though there is a cost of migrating the non critical tasks to some other processor efficient and maximum utilization of the CPU is the primary concern. 1.5 THESIS OUTLINE The thesis is organized as follows; First chapter presents the introduction to multicore architecture, intelligent agents and the limitations of conventional methods of scheduling algorithms and load balancing algorithms. Second chapter describes the literature survey of scheduling and load balancing algorithm. Subsequently, from the inference of literature survey, the objectives of the thesis are presented. Third chapter explains the novel agent based scheduling algorithm. The average waiting time for the proposed algorithm is implemented on the modified Linux 2.6.11 kernel process scheduler. Fourth chapter describes novel hard soft affinity processor scheduling and load balancing using agents. Fifth chapter presents core performance based agent scheduling and thread assignment for heterogeneous multicore system. Sixth chapter explains the binary search tree based load balancing algorithm and equalizing power consumption for processors using load balancing and automatic load balancing using time based unused space collection.. Seventh chapter reports valuable conclusions that are drawn from the research work.