ADAJ: a Java Distributed Environment for Easy Programming Design and Efficient Execution

Transcription

1 S C H E D A E I N F O R M A T I C A E VOLUME ADAJ: a Java Distributed Environment for Easy Programming Design and Efficient Execution Violeta Felea 1,2, Richard Olejnik 1, Bernard Toursel 1,2 1 Laboratoire d Informatique Fondamentale de Lille (UMR CNRS 8022), University of Lille 1, Villeneuve d Ascq CEDEX, France 2 École Polytechnique Universitaire de Lille (Polytech Lille) {felea, olejnik, Abstract. In this paper we present a platform, ADAJ (Adaptive Distributed Applications in Java), that we have developed for distributed applications in Java over a cluster of computers. The objective of this platform is to facilitate the application design and to efficiently use the computing power. ADAJ offers both a programming and execution environment. The programming environment simplifies the user programming effort and offers a MIMD/SPMD programming model style. Our approach relies on the concept of distributed collections, grouping fragmented objects and on asynchronous calls. The ADAJ execution environment supports irregularities during the evolution of the application and in the availabilities of resources. We present the architecture and the tools we have developed: object observation and load balancing mechanisms. The observation mechanism allows to estimate the JVM load. The load balancing mechanism dynamically adapts the execution according to this information. We present estimation measures on two different applications, so that to evaluate the benefits and the overhead of ADAJ. Keywords: methodology of programming, parallelism and distribution, object observation, dynamic load balancing, Java computing.

2 10 1. Introduction The advance in high-speed networks technology added to the increase in the data-processing park and thus in the high availability of spare CPU time have made clusters of workstations a promising alternative to conventional parallel machines. But, the development of big size applications distributed on heterogeneous and dynamical networks rises some problems which are quite difficult to solve. The first problem is to make the design easier and as independent as possible of the execution support. The second problem is to ensure an automatic and optimised distribution of applications on the execution platform. ADAJ (Adaptive Distributed Applications in Java) project tries to provide some answers to both questions, at the levels of application design and of distributed execution on networks. The ADAJ framework concerns distributed objects built around the Java system. We use the interesting characteristics that are offered by Java for the management of the distribution, heterogeneity and programming [13]. Java s machine independent bytecode allows to transform this cluster into a uniform platform and Java integrates the use of object technologies. For that, we have adopted for ADAJ the following two objectives: to simplify the work of the programmer by hiding the problems related to the parallelism management: we provide a complete API so that the programmer can easily develop parallel applications and so he can focus only on the complexities of the problem. This paper presents the ADAJ programming environment (see Section 5); we show in particular how we can develop parallel programs with some tools as distributed collections and asynchronous calls. to allow dynamic and automatic or quasi-automatic deployment of applications in heterogeneous computing environments; we ensure an effective computing execution by using mechanisms of intra-application load balancing. For this point, we bring some solutions by introducing at the middleware level a mechanism based on a particular monitoring which ensures the automatic adaptation of processing distribution to the execution and to resource availability. We present here the features introduced in the ADAJ execution environment to achieve this goal (see Section 6). After providing some details on the ADAJ system implementation (see Section 7), we give some performance evaluation of the ADAJ environment,

3 11 at the two levels: programming and execution. Both benefits and overhead are estimated (see Section 8). 2. Related work Representative works on Java distributed environments for cluster computing may be classified on the criterion of offering a single system image (SSI) to the user, that means transparency of the cluster view. Three different approaches can be distinguished [2], comparing the level of the development environment to the Java virtual machine (JVM) level: below the JVM (Java/DSM [32]), at the JVM level (cjvm [2]), on top of the JVM (JavaParty [26]). The first two approaches need some modifications of the JVM, either to provide shared memory abstraction over the cluster and to automatically handle communication between machines, or to implement a distributed memory heap, for a distributed memory model. ADAJ, just like the JavaParty project, modifies neither the JVM nor any of its underlying execution aspects; instead, the source code accepts a new keyword, which marks remote objects. This implementation is above the JVM, using third-party Java packages, which support the architecture of several JVMs. Even though existing projects offer tools in view of execution optimisation, such as thread migration [31, 6], object migration [10], or communication improvement [20], [22], few [21] include automatic and dynamic redistribution of applications. Some of these approaches have been oriented towards the C/C++ programming language [1, 5], because of easier access to system calls which supply a good load approximation. The Java existing techniques are less transparent, because they consider either user annotations [11], or they enable the user to graphically visualise distributed aspects, as topology or communications, and allow him to control and modify the execution [3]. Such tools lack transparency and automation, relying completely on the user decisions. ADAJ is significantly different from these approaches, as it hides redistribution of objects from the user, who still can use explicit placement, if one desires.

4 12 3. ADAJ architecture ADAJ is an execution and programming environment which is intended for distributed applications. It is implemented above the JavaParty and Java/RMI platforms according to a multi-layer structure (see Fig. 1) using several APIs 1. JVM: The first interest of Java is that the set of Java virtual machines is considered as a homogeneous base for the construction of distributed applications. RMI: Remote Method Invocation [28] allows the objects placed on various JVMs to communicate as if they were placed on the same JVM. RMI uses the stub/skeleton mechanism for method invocation and serialisation mechanism for parameters or results passing. JavaParty: JavaParty [26] provides an execution environment for applications distributed on a workstation cluster. JavaParty extends the Java language to make it possible to express object distribution. This distribution is quite transparent. JavaParty provides a mechanism for object migration. Load Balancing System Application Object redistribution Relations Distributed Collections Asynchronous calls A D A J Observation System JavaParty RMI JVM JVM JVM Network Fig. 1. The ADAJ multi-layer structure On top of this system, ADAJ provides tools to simplify the distributed programming. For this intention, it provides the concepts of distributed collections [15, 14] and of an asynchronous invocation mechanism. In addition, 1 Application Programming Interface.

5 13 ADAJ carries out a load balancing in order to improve the application performances. This load balancing mechanism [17] relies on the application object redistribution which exploits information resulting from the observation system [23, 24, 25, 9] of the dynamic relations between objects of the application which is executed on the platform. 4. ADAJ object model The observation mechanism of the applications in the ADAJ environment aims to provide knowledge about the behaviour of applications during their execution. This knowledge is obtained by the observation of the activity of the objects which belong to the application. ADAJ supports two types of objects: local and global objects. Local objects: they are traditional Java objects which belong to the users. They can be used only in the JVM where they were instantiated and cannot be remote accessible. If these objects are needed on another JVM, they will be copied. The running state of the object and, more precisely, its attributes are to be copied out in the new JVM. There is no coherence maintained between the original and the copy. This copy is also a local object. The local objects are not observed by the ADAJ monitoring mechanism. Global objects: they can be created remotely in any JVM of the cluster. They are both remote accessible and migratable, corresponding to the JavaParty concept of a remote object. The global objects are observed by the ADAJ monitoring mechanism. Therefore, the global objects are remotely accessible objects, which means that they can be used remotely or locally by means of a remote reference. They are shared by all the JVMs which have a reference to these objects. They are also implicitly migratable by the ADAJ redistribution mechanism and/or explicitly by the programmer. The global object migration is the basic mechanism of the ADAJ load balancing. In order to be automatically chosen for migration, they are the targets of the observation mechanism, being thus observable.

6 14 5. The ADAJ programming environment Deployment of applications is the key for the exploitation of distributed resources in distributed systems. In object oriented languages, as it is the case for the Java language, objects are submitted to the deployment process. An application execution is a succession of method invocations which are executed on objects at the location where those objects are instantiated and thus using the corresponding resources (CPU and memory). This distribution process is generally accompanied with parallelisation techniques; exploiting distributed resources efficiently can only be achieved using parallelism. Consequently, two issues are important in designing high performing cluster applications: distribution expression and parallelism integration. In ADAJ these concepts are introduced at the same time through distributed collections and asynchronous calls Distributed collections A distributed collection is a new concept proposed in ADAJ, meant for the expression of both distribution and parallelism. Distribution is achieved through fragments, entities of the distributed collection, while parallelism is offered through processing tools associated with the distributed collections. Usually, a Java collection groups a lot of user objects into a single entity. In a distributed collection of ADAJ, these objects are grouped into several fragments which are global objects distributed on the cluster. This set of fragment is originally structured into a two-level hierarchy: the root is the fragmented collection, which groups a number of distributed fragments of a second level. One fragment is attached to a single distributed collection which prevents incoherences due to parallel processing. At the same time this architecture may impose strict constraints on object sharing. In a userdefined structure, fragments may also be distributed collections, so a tree architecture could be easily developed. Fragments may be placed randomly, on the machines of the cluster, just like any other global object. This implicit deployment can be overridden: fragments of a distributed collection can be deployed cyclically or by blocs. Parallelism concerns processing over the distributed fragments. The same processing applied on every component (fragment) of the distributed collection could not be efficient, but far if executed in parallel. Thus, in ADAJ, the processing activation on a fragmented collection leads to the dispatching

7 15 of this processing, in parallel, over all the fragments. Transparency is one of the aims of the ADAJ development environment, so the parallel tools library hides completely the use of threads from the user. The definition of fragments allows the user to deal with the degree and the granularity of parallelism, through the size of the distributed collection and the processing granularity. The number of fragments, which gives the size of the collection, is not to be fixed when designing applications. It could be determined easily at runtime, dynamically, by creating new fragments, or removing them. The user deals with the degree of parallelism, but also with its granularity, which is given by the number of objects in each fragment and by the the quantity of parallel processing over fragments. The concurrent invocation of processing over several fragments increases the result recovery problem. Two kinds of behaviour are possible, no matter if the contents of fragments change or not: no results are returned, or, on the contrary, new results are returned, which need to be recovered. Consequently, ADAJ proposes the use of a new concept, similar to that of a future object [4], in the form of a collector. The collector has a double functionality, depending on the generation of results: if no results are returned, then the collector can control the end of all the processings, if results are returned, then the collector can assure the result recovery in different ways: recovery of all the results, of a first available and not yet consumed result, or recovery of a result issued from a given fragment. The collector integrates some logical states in order to manage the receipt and the use of the returned result. Asynchronous primitives over the collector allow to loosen the constraint of simultaneous wait for results, thus orienting the programmer towards an asynchronous programming model, instead of a synchronous BSP 2 one Asynchronous calls Asynchronism is generated by fragment processing, but the programmer may be interested in a parallel behaviour, independently of the distributed collection structure. In order to achieve this feature, asynchronous calls are provided for every global object. Thus, parallel processing can be activated not only on objects of the distributed collections, but also on objects already 2 Bulk Synchronous Programming.

8 16 created by the application, independent of the distributed fragments. The possible result recovery is assured by the concept of a future object The development of applications in ADAJ Designing a parallel and distributed application proves to be a tedious or difficult task. ADAJ tries to hide, as much as possible, the underlying distribution mechanism and the associated issues, and at the same time, provides useful and easy tools to express parallelism. The synthesis of ADAJ features, as programming environment, is given next, with an example of a distributed application design and with the syntax of parallellism expression Characteristics The ADAJ parallel tools over distributed collections and global objects give a particular programming style, of SPMD 3, or moreover of MIMD 4 type. They allow: to easily express object parallelism, in which processing is activated over fragments, in a parallel and asynchronous way, and for which the results are recovered asynchronously, to express method parallelism, to increase naturally the granularity of parallelism, by specifying the processing to be invoked concurrently, over objects grouped into fragments, and not spread over the cluster, to design applications, freely of defining the granularity and degree of parallelism, and to postpone these decisions at program launch, depending on the data processed and on the features of the execution environment. The degree of parallelism can also vary during execution, by adding or removing fragments. to be independent of objects deployment and of their possible migrations. 3 Single Program Multiple Data. 4 Multiple Instruction Multiple Data.

9 Example of ADAJ programming The matrix multiplication is a classical operation in numerical computation, which rises problems as matrix partitioning and distribution, when matrices have big dimensions. We consider two matrices, A and B, to be multiplied in a parallel way. We can suppose that A is divided into lines and that there are three possibilities for the B matrix: to duplicate it on all the machines of the cluster, to share it (the matrix B being entirely stored on a single machine), to divide it (into columns) and to distribute it. In ADAJ, the three possible situations for the B matrix suggest several solutions of possible data structures: A fragmented, B fragmented - two different distributed collections (one fragment of the first collection contains lines of matrix A, and one fragment of the second collection contains columns of matrix B) or one distributed collection (every fragment contains several lines of A and several columns of B). In the first case, an easy generalisation can be obtained, for every other operation over matrices, and it offers homogeneity (the two input matrices and the resulting matrix are represented in the same way). In the second case, represented in Fig. 2, the gain is in the execution time, because multiplication of different parts of the A and B matrices, in the same fragment, is performed locally. The result could be either split in each fragment or put in a separate distributed collection. coldistr coldistr A1 B3 C1 A2 B1 C2 A3 B2 C3 A1 A2 A3 B3 B1 B2 C1 C2 C3 C1 = A1B3. A1B1. A1B2 C2 = A2B3. A2B1. A2B2 C3 = A3B3. A3B1. A3B2 coldistrres Fig. 2. A matrix multiplication by fragmentation of the second matrix

10 18 A fragmented, B shared - one distributed collection, in which every fragment contains lines of A. For this data structure, the B matrix is in one piece, which may generate an easier parallel code writing, but efficiency is reduced, because every access to an element of B is a remote call. Some auxiliary storage problems may appear if the B matrix is big. A fragmented, B duplicated - one distributed collection, in which every fragment contains lines of A and the matrix B. Fig. 3 shows two coldistr coldistr A1 A2 A3 B B B C1 C2 C3 A1 A2 A3 B B B C1 C2 C3 C1 = A1B C2 = A2B C3 = A3B coldistrres Fig. 3. A matrix multiplication by duplication of the second matrix possible data types for the result matrix, as the result is contained in the fragment itself or a new distributed collection is returned Syntax of parallel tools User fragments can be defined, by inheritance of the ADAJ most general library class, RemoteFragment: class MyFragment extends RemoteFragment{ public void voidmethod(...){...} public Object resmethod(){...; return...; } public Object resmethodparam(integer intvalue) {...; return...; } } A distributed collection is formed as shown next: DistributedCollection distrcol = new DistributedCollection ("MyFragment");

11 19 which creates an empty distributed collection, its fragments being of MyFragment type. Parallelism is expressed quite easily through the use of parallel primitives associated with the distributed collection. The user programs parallel applications simply, without dealing with any complicated structures as creation of threads, start and pass parameter. Two types of parallel primitives over the distributed collection are offered: distributev, for methods returning no results, and distribute, for methods returning a result, all being static methods of the DistributedTask library class. The implementation of the parallel primitives has produced two solutions: one, based on the reflection mechanism [27], which offers untyped parallel tools [14], and the other one, based on code generation [16], which recovers the feature of the Java language, that of being strongly typed Parallel untyped tools For the untyped solution, the Java reflection mechanism allows to build and invoke the method to be applied on fragments, based on the method name and an array for the parameters. Invoking a method resmeth (which returns results), in parallel, over all fragments of the previously defined distributed collection, is done in the following manner: Collector c = DistributedTask.distribute (distrcol, "resmethod", null); A similar call, using the distributev primitive is made in order to activate in parallel the same processing when results are expected: Collector c = DistributedTask.distributeV (distrcol, "voidmethod", null); When parameters are needed to be passed, as for the resmethodparam method, the parallel call can be formed as: Collector c = DistributedTask.distribute (distrcol, "resmethodparam", new Object[]{new Integer(10)}); Extensions for the distribute primitives are proposed in order to invoke parallel processing, but with different parameters to each fragment: Collector c = DistributedTask.distributeD (distrcol, "resmethodparam", new Object[]{new Integer(10), new Integer(20)});

12 20 Asynchronous calls are constructed in a similar way, through the use of static library methods of the Asynchronous library class, over global objects (in the example, the obj object): Return rv = Asynchronous.mVoid(obj, "voidmethod", null); Return rr = Asynchronous.mReturn(obj, "resmethod", null); Parallel typed tools The code generation used to produce a strongly-typed parallel code uses a special compiler which writes source code based on the fragment code. For every public method of the fragment of class X, four new public methods are written in a new class XDistr, for the distribute primitives and one new method in the XAsync class, for asynchronous calls. The previous parallel primitives over distributed collections can be rewritten: Collector c = MyFragmentDistr.voidMethod(distrCol); Collector c = MyFragmentDistr.resMethodParam (distrcol, new Integer(10)); Collector/Return primitives The recovery of results, or the synchronous wait for the end of all processing is performed using methods associated with the collector (for processing over distributed collections), respectively to the future object (for asynchronous calls). // wait for a first available result PackRes res = c.getone(); // recovery of the result Object result = res.getresult(); // wait for the end or for the result of the asynchronous call rv.waite(); rr.get();

13 21 6. ADAJ execution environment The execution environment in ADAJ is designed as a layer over several JVMs located on a pool of computers of heterogeneous hardware and systems connected by a network. The ADAJ system does not only exploit the computing power by the design of parallel applications; it also uses underused heterogeneous resources by automatically distributing the user application. The remote object concept is necessary in order to distinguish the distributable objects. But this is not enough: remote objects should be tracked; these objects are the ADAJ global objects, which are subject to redistribution when the load balancing is performed. Scheduling distributed objects, by migrating objects to another lightly loaded machine, should be carried out dynamically and transparently, without any user involvement General architecture of the redistribution system Load balancing is handled by two kinds of software agents: the load extractor and the correction agent located on each of the cluster s machines. The load extractor defines the JVM load value, depending on information issued from the ADAJ observation mechanism, presented next on the number of threads. This value is also exploited by every correction agent of a heavily loaded machine, when it decides on the redistribution of its objects, to one or several lightly loaded ones. Another agent is the decision controller, present on the host on which the user submits the main program. This component is responsible for gathering information on the machine loads (computed by the load extractors) and for deciding over an imbalance existence. Also, once the decision is taken, it dispatches particular directives to the heavily loaded machines. The decision component has no track on the object references; so the correction components, which are placed on every machine decide on the objects to be redistributed, in a concurrent manner and in the case of heavily loaded machines.

14 Object observation The originality of the observation mechanism in ADAJ consists of the monitored information: the number of method invocations between objects is observed. Generally, for a method invocation, we can observe either the execution time, or the size of information exchanged. In ADAJ, we observe method invocations, in terms of their number. This information allows us to estimate the object activity and communications. It looks like sufficient information because in an object-oriented program all activity (CPU processing or communication) is achieved through method invocations. The observation mechanism is made of three components: the graph of objects, the tracer of relationships between the graph objects and the observer (see Fig. 4). The object graph Local objects Events for the invocation counting Relation tracer between objects Access to the counters Observer Global objects Fig. 4. A observation mechanism The application object graph An object graph can be built from the relationships between the objects,. The relations between the objects (see Fig. 5) represent three types of method invocations: method invocations of a global object to a global object (the OGI binary relation), method invocations of a global object to all local objects (the OLI unary relation), input invocations of the methods invoked on a global object (the II unary relation). The method invocation tracer In order to store information concerning the relations between the objects an adequate data structure is necessary. This structure contains counters to store the relation weights (i.e. the number of method invocations). We have three counting types according to the three observed relation types described before.

15 23 Local invocation counter Global invocation counters g g g g Local objects Incoming invocation counter g Global Objects g Fig. 5. The three types of the global object observations The storage space of all the counters must be accessible both by the objects, which increase the counter corresponding to the relationships observed, and by the observer, which updates the counter. This space can be organised according to several levels: an object, class, JVM, and the entire platform. In order to facilitate the management of all counters, we chose to make each global object manage its own list of counters and each JVM to contain a list of all the global objects observed in that JVM. This list is accessible by the observer so that it can update the stored information. The observer The observer provides a range of methods to consult information of the observation. This set can be used by a global observer on the execution platform level. The global observer gathers in this case the information provided by the local observers in each JVM. The computation of the method activation cannot be a simple addition. A mechanism of information ageing [8] is installed. To fix the new distribution only the recent past interests us. It is also necessary not to be too reactive to the micro-phenomena and to carry out a smoothing of taken measurements. The observer is given the responsibility for installing this mechanism of observation information smoothing and for balancing recent information compared to the past.

16 Object distribution strategy To measure the load of each machine, the number of threads, recovered through the standard Java tool 5, is not a sufficient metric, because no difference can be made between the active threads and the blocked ones. All threads cannot be considered as load generators, because, if threads are blocked, they do not consume any CPU cycles. A new measure is introduced in ADAJ in order to separate between the two situations: the JVM workload. The workload of a JVM is directly dependent on the workload of every object it contains. The objects of a JVM are either local or global, but only the second type is observed by the observation mechanism. A global object performs a work which is linear with the number of input invocations (the number of methods which have been called) and with the number of invocations towards local objects. The relations between objects, transformed in counters with the same names, define the workload of a global object: WP obj = II obj + OGI(obj,obj) + OLI obj. Their sum, for every global object of a Java virtual machine, defines its workload. Together with the measure of number of threads, the JVM workload can give a classification of JVMs into three categories: underloaded, overloaded, and normally loaded. The criteria are the following: a JVM is overloaded if both the number of threads and the workload are important, which translates an intense activity of every object in the JVM; on the contrary, a JVM is underloaded if either the number of threads is not important or the workload is not significant. Several threshold techniques, needed to the classification, have been proposed and tested. Finally, we chose a mixed metric, using a coefficient of variation and the K-Means algorithm [19], which gives the best results. The correction agent on each overloaded machine decides on the redistribution of the objects it contains in a concurrent manner. In order to avoid ping pong effects, only one object is redistributed at a time by an overloaded JVM, and underloaded machines are analysed in a random order. Two aspects are defined by the correction component: the objects which are concerned with the redistribution and their new destination. The global objects having a weak attraction towards the local JVM (i.e. which has a low II counter) and with an average workload are the most interested candidates for the redistribution. The attraction of an object towards a JVM is quantified in terms of method invocations with the objects residing on the JVM. Redistributing objects with an average workload is imposed because loads should be sensibly changed and heavy loaded objects are difficult to redistribute because of the important number of method invocations. 5 The Java class method, java.lang.thread.activecount().

17 25 The new destination for these objects is computed depending on the external attraction of the objects (similar to the attraction function which was defined previously, but towards another JVM), and on the workload of the JVM. The redistribution is achieved through migration. Migrating objects has consequences on both the communication links and on the maintenance of the list of global objects on the JVM by the observation mechanism. 7. ADAJ system implementation 7.1. Parallel tools Two different techniques have been proposed for the parallel tools in ADAJ: the first, based on reflection, and the second, on code generation. The reflection mechanism rises some difficulties when considering polymorphism and inherited methods. The method to be applied is specified by its name and an array of Object type matching the parameters. The compatibility between the actual and formal parameters is tested for both the primitive types and the reference types. The search of an applicable method in the superclass is described in Fig. 6. While this method does not guarantee the strong typing of Java method calls, the second technique of parallel tools implementation assures it. In this case, a code generation tool is used, which writes parallel code, functionally similar to that of the previous technique. The generated classes are presented in Fig Observation: object marking Marking an object consists in adding a new characteristic to it (see Fig. 8). For instance, we add the migrability property to the object so that it becomes a migratable object i.e. it can be moved from a JVM to another one. The addition of a mark can be done at the class or object level. At the class level, all the objects will carry this mark all over their lives. At the object level, the mark addition is made at the object creation. It remains valid during all the object s lifespan; this mark can be activated or inactivated.

18 26 recover all methods defined in the class or in every superclass ; recover one or several methods having the same name and the same number of parameters as those of the method to invoke ; if (no method is found) then the call has not been well built ; else while (the method is not found) recover the types of parameters ; if (the parameters passed are instances of the found types) then the method has been found ; else take the next method ; endif endwhile endif Fig. 6. The search algorithm for an applicable method To ensure the transparency and the facility of the object creation, we chose to use marking at the class level. The object marking is done implicitly; the marked objects are those which inherit the RemoteObject class of JavaParty. The implementation of the information necessary to make this marking is done by post-compilation techniques. We used a tool of instrumentation of the bytecode, JavaClass [29], [12] of the Free University of Berlin. For all the classes which inherit the RemoteObject class, the bytecode must be modified so that the corresponding object could be observed. The procedure of post-compilation is composed of two phases (see Fig. 9): a phase which modifies the bytecode of a JavaParty class. The output of this phase is a standard bytecode of the observed class. the result of the first phase will be recompiled by the RMI compiler, rmic, in order to generate the stubs and the skeletons corresponding to the global class.

19 27 proxy part (MyFragment) global class fragment (MyFragment) ADAJ parser local class distribute calls (MyFragmentDistr) local class asynchronous calls (MyFragmentAsync) Java Party compiler instance part (MyFragment_instance) static part (MyFragment_class) Fig. 7. Complete class generation in ADAJ RemoteObject Inheritance Remote class post compiler Bycode transformation Marked class Fig. 8. Adding mark to a class 8. Evaluation In order to prove the efficiency of the ADAJ environment, the system was evaluated from two points of view: the use of the parallel tools and the object distribution. The results show that ADAJ provides good speedup and improves application execution times by load balance, in case of imbalances. Not observed remote class post compiler global class rmic global class & stub & skeleton Fig. 9. The post-compiler for object marking

20 Testbed The experiments were conducted on a network of a dozen of Intel monoprocessor machines. Processor speed is 733 MHz, with 128M RAM. All these machines run JVM 1.3, on Linux (Debian 2.2) and are connected by an Ethernet network. The tested applications are of two types: intensive computation processing, in the form of a genetic island model algorithm, to solve the TSP 6 problem, and intensive communication in the form of a synthetic application Cost of parallel tools The use of distributed and asynchronous calls makes easier the expression of parallelism for the users of the ADAJ development environment, because the use of threads and the results recovery are completely transparent. We tried to evaluate performance of a parallel ADAJ application, and then compared it to a similar JavaParty application, in order to evaluate the trade-off between efficiency, facility and transparency. The TSP application implements a sequential genetic evolutive algorithm, in the island form, and runs on a single machine. This application was chosen because of its concurrent, distributed nature and its deterministic behaviour. Naturally, the islands are subpopulations on which processing may be executed in parallel, which is done in the ADAJ distributed version. The execution times show good speedups (see Fig. 10), comparable to those of JavaParty, when using the distributed collection to model the distributed subpopulations, and when invoking the corresponding parallel tools. The overall overhead of the ADAJ applications execution times, compared to the JavaParty ones, was estimated at 1.09% in average, when the distributed collection is constructed sequentially and the processing is parallel (see Fig. 11), and at 0.56% in average, when both construction and processing are parallel. These results show similar execution performances of ADAJ or JavaParty applications. Moreover, the ADAJ distributed programming is much simplified compared to the JavaParty or Java/RMI programming style [15]. 6 Travelling Salesman Problem.

21 29 (a) Speedups of the distributed versions (subpopulations of 1000 individuals) (b) Speedups of the distributed versions (subpopulations of 1500 individuals) (c) Speedups of the distributed versions (subpopulations of 2000 individuals) Fig. 10. Speedups in ADAJ, compared to the JavaParty speedups, for different subpopulations sizes 8.3. Cost of migration The load balancing mechanism is almost completely relying on the ability to migrate objects from a highly loaded machine to a lightly loaded one. Migration generates an overhead, if the migration is successful. If migration cannot take place (there is at least one method in execution over the object), the overhead is insignificant. Otherwise, the cost is due to the time spent in serialising and deserialising the object, which is still slow in Java. Tests showed this dependency (see Tab. 1).

22 30 Fig. 11. Overhead of using ADAJ distributed collections (only the processing is parallel) Tab. 1. Costs (in ms) of the JavaParty migration in homogeneous and heterogeneous systems object size homogeneous heterogeneous system system empty object 13,8 21,6 object with 100 empty objects 14,6 23 object with 100 Integer 16,6 26,5 object with 100 objects 280,8 277,5 each with 100 Integer 8.4. Benefit of object distribution The load balancing mechanism was evaluated for two different applications: the TSP problem, implementing a genetic island algorithm, and a communicating synthetic application. The TSP problem was chosen because an ideal final distribution of subpopulations is easily detected, so the evaluation has a reference to be compared to. The communicating application shows the importance of considering communications in the deployment of distributed applications. The TSP problem considered firstly a number of subpopulations, of the same size, distributed unequally over the cluster machines. Diagrams in Figs. 12 and 13 show the temporal evolution of the number of objects on each JVM, for particular initial distributions. The JVMs containing more than the average object number generally tend to remove objects to JVMs having fewer objects than the average. When arriving to a balanced situation, characterised by a particular value of the coefficient of variation, objects

23 31 rarely move from one JVM to another, a case which corresponds to different fluctuations Evolution of objects - cst=0.3 nbobj #0 nbobj #1 nbobj #2 nbobj #3 nbobj #4 nbobj #5 nbobj #6 nb of objects inspection nb Fig. 12. The evolution of objects for the initial distribution 0/15/20/5/35/15/30 The same behaviour, towards work quantity equalisation, was also tested for subpopulations having different sizes, the ideal case being this time a final distribution with the same number of individuals to be processed for every machine. The execution time was consequently improved, depending on the type of initial distributions: for equalled-size subpopulations (in average, of 17% up to 58%), and for initial distributions of unequalled-size subpopulations (of approximately 19%). In a perfectly balanced case, when considering an initial distribution of equally-sized subpopulations, the overhead of 2% up to 7% (depending on the frequency of the imbalance checking) measures both the observation mechanism overhead and the overhead due to the load balancing mechanism. The overhead of the single observation mechanism has been measured between 0.07% and 2.98%. Communications are considered by the correction component. The load balancing mechanism in ADAJ balances the load, targeting communication optimisation, and does not react to communication imbalances. The second type of application tested showed the importance of taking into account the communication links between objects when making decisions of redistribution. The experiment proved that a cyclic pattern of communication between objects can be recovered by the load balancing mechanism: during the cor-

24 Evolution of objects (mixed algorithm - cst=0.3) nbobj #0 nbobj #1 nbobj #2 nbobj #3 nbobj #4 nbobj #5 nbobj # Evolution of objects (mixed algorithm - cst=0.3) nbobj #0 nbobj #1 nbobj #2 nbobj #3 nbobj #4 nbobj #5 nbobj #6 nb of objects nb of objects inspection nb inspection nb Fig. 13. The evolution of objects for the initial distributions 25/30/30/35/0/0/0 and 40/40/40/0/0/0/0 rection phase it takes into account communicating objects [18] through the notion of attraction. 9. Conclusions This paper has presented the environment called ADAJ, which stands for Adaptive Distributed Applications in Java, that implements a model for distributed and parallel applications. It offers easy and efficient computing in Java. It is made of a development environment which facilitates the design of applications and of an execution platform, which improves performances Contributions The main contributions of the ADAJ environment concern both the design methodology and execution of applications: ADAJ offers facilities for parallel and distributed Java programming. It allows users to seamlessly create global objects and to access them transparently, just like the local ones. Moreover, this kind of objects

25 33 is used to deal with the granularity and the degree of parallelism, in the case of fragments contained in distributed collections. Also, asynchronous method calls, associated to every global object, express method parallelism. ADAJ transparently applies object redistribution to balance the system load. This is based on an observation mechanism of objects, which allows to draw a graph of object interactions and activity. The originality in ADAJ is the exploitation of the counting of method invocations as representing object activity, at a lowest cost, in order to define the load of a JVM. ADAJ is 100% Java compliant, without modifying the JVM, but using a specific compiler Future works The future works concern the extension of the load balancing tool, to include issues like heterogeneity and a multi-user execution environment. The heterogeneity characterises the variety of computer performances, in terms of processor speed and memory capacity, system load (in a multi-user execution environment) and system type (particularly the diversity of Java thread implementation). This heterogeneity, reflected in the redistribution mechanism, would offer inter-applications load balancing. The load is not only restricted to the application load but it is also influenced by the load of the whole computer, associated with its performance index. In this case, the decision controller should provide particular behaviour if load, exterior to the application, exists. A new 100% Java mechanism to estimate the load of a cluster machine is now studied, extension of the one proposed in [7]. 10. Thanks Thanks to Amer Bouchi for his collaboration to the development of the observation mechanism. He is now at the University of Alep, Syria.

26 References [1] Arabe J., Beguelin A., Lowekamp B., Seligman E., Starkey M. S. and Stephan P.; Dome: Parallel programming in a heterogenous multi-user environment, Technical Report at Carnegie Mellon University, [2] Aridor Y., Factor M. and Teperman A.; cjvm: A Single System Image of a JVM on a Cluster, International Conference on Parallel Processing, Fukushima, Japan, 1999, pp [3] Baude F., Caromel D., Huet F., Mestre L. and Vayssière J.; Interactive and Descriptor-Based Deployment of Object-Oriented Grid Applications, in 11th IEEE International Symposium on High Performance Distributed Computing HPDC-11, Edinburgh, Scotland, [4] Baude F., Caromel D., Huet F. and Vayssière J.; Communicating Mobile Objects in Java, HPCN, 2000, LNCS 1823, pp [5] Bhandarker M.L., Brunner R.K. and Kale L.V.; Run-time Support for Adaptive Load Balancing, in J. Roliam and al.: IPDPS Workshops, Cancun, Mexico, 2000, LNCS 1800, pp [6] Bouchenak S. and Hagimont D.; Zero Overhead Java Thread Migration, Technical Report at Institut National de Recherche en Informatique et en Automatique, [7] Bouchi A., Olejnik R. and Toursel B.; Java tools for measurement of the machine loads, In Advanced Environments, Tools and Applications for Cluster Computing, Mangalia, Romania, 2001, LNCS 2326, pp [8] Bouchi A., Toursel B. and Olejnik R.; An observation mechanism of distributed objects in Java, 10th Euromicro Workshop on Parallel Distributed and Network- Based Processing, Las Palmas de Gran Canaria, Spain, January [9] Bouchi A., Olejnik R. and Toursel B.; A new estimation method for distributed Java object activity, 16th IEEE International Parallel and Distributed Processing Symposium, Marriott Marina, Fort Lauderdale, Florida, April [10] Busch M.; Adding Dynamic Object Migration to the Distributing Compiler Pangaea, Technical Report at Freie Universitat Berlin, [11] Corradi A., Leonardi L. and Zambonelli F.; High-Level Directives to Drive the Allocation of Parallel Object-Oriented Applications, in Proceedings of High-Level Parallel Programming Models and Supportive Environments (HIPS), IEEE CS Press, Geneva, Switzerland, [12] Dahm M.; Byte Code Engineering, JIT 99: Java-Informations-Tage, 1999.

27 35 [13] Farley J.; Java Distributed Computing, O Reilly publisher, [14] Felea V., Devesa N., Lecouffe P. and Toursel B.; Expressing Parallelism in Java Applications Distributed on Clusters, IWCC: NATO International Workshop on Cluster Computing, Romania, September [15] Felea V., Devesa N. and Toursel B.; Les collections distribuées: un outil pour la conception d applications Java parallèles, Technique et science informatiques, 22 (3), 2003, pp [16] Felea V. and Toursel B.; Methodology for Java distributed and parallel programming using distributed collections, 16th International Parallel and Distributed Processing Symposium, Marriott Marina, Fort Lauderdale, Florida, April [17] Felea V.; Exploiting runtime information in load balancing strategy, DAPSYS: Fourth Austrian-Hungarian Workshop on Distributed and Parallel Systems. Linz, Austria, September [18] Felea V. and Toursel B.; Middleware-based Load Balancing for Communicating Java Objects, in CIPC Proceedings, Sinaia, Romania, 2003, pp [19] Hartigan J.A., Wong M.A.; A K-Means Clustering Algorithm, Applied Statistics, 28, 1979, pp [20] Maassen J., Nieuwpoort R., Veldema R., Bal H. and Plaat A.; An Efficient Implementation of Java s Remote Method Invocation, ACM Symposium on Principle and Practice of Parallel Programming (PPOPP), Atlanta, Georgia, USA, 1999, pp [21] Neary M.O., Brydon S.P., Kmiec P., Rollins S. and Cappello P.; Javelin++: scalability issues in global computing, Concurrency: Practice and Experience, 12 (8), 2000, pp [22] Nester C., Philippsen M. and Haumacher B., A More Efficient RMI for Java, ACM Java Grande Conference, San Francisco, CA, 1999, pp [23] Olejnik R., Bouchi A. and Toursel B.; A Java object policy for load balancing, PDPTA: The international Conference on Parallel and Distributed Processing Techniques and Applications, 2, June 2002, pp , Las Vegas, USA. [24] Olejnik R., Bouchi A. and Toursel B.; Object observation for a Java adaptative distributed application platform, PARELEC: International Conference on Parallel Computing in Computing in Electrical Engineering, September 2002, Poland, pp [25] Olejnik R., Bouchi A. and Toursel B.;Observation Policy in ADAJ. Accepted to Parallel and Distributed Computing and Systems (PDCS), USA, [26] Philippsen M. and Zenger M.; JavaParty Transparent Remote Objects in Java, Concurrency: Practice & Experience, 9(11), 1997, pp

28 36 [27] Sun Products JDK1.2; Java Core Reflection, [28] Sun products JDK1.2; Remote Method Invocation, [29] The Jakarta Project. JavaClass: the Byte Code Engineering Library, [30] L. Verbièse, M. P. Lecouffe and B. Toursel; Distribution and load balancing in Acada, PARELEC 98, Bialystok, Poland, September [31] Weyns D., Truyen E. and Verbaeten P.; Serialization of a Distributed Executionstate in Java, Proceedings of Net.ObjectDays NODe 02, Erfurt, Allemagne, September [32] Yu W. and Cox A.;Java/DSM: A platform for heterogeneous computing, In Workshop on Java for Science and Engineering Computation, Las Vegas, June ACM. Received February 11, 2004