Runtime Locality Optimizations of Distributed Java Applications

Size: px
Start display at page:

Download "Runtime Locality Optimizations of Distributed Java Applications"

Transcription

1 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing Runtime Locality Optimizations of Distributed Java Applications Christian Hütter, Thomas Moschny University of Karlsruhe {huetter, Abstract In distributed Java environments, locality of objects and threads is crucial for the performance of parallel applications. We introduce dynamic locality optimizations in the context of JavaParty, a programming and runtime environment for parallel Java applications. Until now, an optimal distribution of the individual objects of an application has to be found manually, which has several drawbacks. Based on a former static approach, we develop a dynamic methodology for automatic locality optimizations. By measuring processing and communication times of remote method calls at runtime, a placement strategy can be computed that maps each object of the distributed system to its optimal virtual machine. Objects then are migrated between the processing nodes in order to realize this placement strategy. We evaluate our approach by comparing the performance of two benchmark applications with manually distributed versions. It is shown that our approach is particularly suitable for dynamic applications where the optimal object distribution varies at runtime. 1. Introduction Java enables developers to express concurrency and to create parallel applications by means of threads. Performance gains over a sequential solution can only be expected if the virtual machine is executed on a system with several processors. JavaParty [10] extends Java by a distributed runtime environment that consists of several Java virtual machines. The virtual machines are executed on the nodes of a cluster of workstations. Each virtual machine has its own address space, but can perform remote method invocations on other virtual machines. Thus, JavaParty allows for performance gains through parallelism in a distributed environment. Solely distributing objects and threads over virtual machines is not sufficient for achieving performance gains. Since the placement of an object determines the processor of its methods, only methods of objects that reside on different machines can actually be executed in parallel. So we have two conflicting goals: On the one hand, groups of objects with frequent and expensive communication should be placed on the same node. On the other hand, objects should be distributed over the available processors to enable parallelism. Until now, JavaParty provides a mechanism to create remote objects on specific nodes of a cluster environment. The developer is responsible for distributing the individual objects and thus for distributing the activities to the processing nodes. Such a manual approach has several disadvantages. First, the object distribution is dependent on the specific topology for which the program is compiled. The distribution strategy must be adapted to each target platform. Second, manually specifying the location of every single object creation is tedious. Third, the optimal placement of objects often cannot be determined statically for dynamic applications where the optimal location of objects changes at runtime. The work at hand focuses on the automatic generation of a distribution strategy for remote objects. The generation is based on runtime information of the distributed system. Thus, the programmer does not have to worry about a proper object distribution and can focus on the solution of the problem. Even if the initial object distribution generated by JavaParty is not optimal, the locality of the application is optimized at runtime. In chapter 2 we give a brief overview of JavaParty. Chapter 3 discusses related work in the field of distributed Java applications. In Chapter 4 we describe the design of our approach and explain some basic concepts that are necessary for further understanding /08 $ IEEE DOI /PDP

2 Chapter 5 presents the implementation and discusses the problems we encountered. In chapter 6 we evaluate the effectiveness and efficiency of our work using two benchmark applications. Finally, chapter 7 concludes this paper. 2. JavaParty JavaParty extends Java by a pre-processor and a runtime environment for distributed parallel programming in workstation clusters. It transparently adds remote objects to Java whose methods can be invoked from remote virtual machines. Programmers can use the keyword remote to indicate that a class should be remotely accessible. Instances of remote classes are called remote objects, regardless on which virtual machine they reside. The runtime system offers a mechanism to migrate remote objects between machines. Java Remote Method Invocation (RMI) [14] permits the creation of classes whose instances can be accessed remotely from other JVMs. JavaParty uses RMI as target and thus inherits some of its advantages, e.g. distributed garbage collection. It uses a special pre-processor to generate pure Java source code that is consistent with the RMI requirements. This approach hides the increased program complexity due to RMI constraints as well as the additional code for creation and access of remote objects. JavaParty code is transformed into regular Java code plus RMI hooks. The resulting RMI portions are fed into the RMI compiler to generate stubs and skeletons. Since existing code might be using the original classes, handle objects are introduced that hide the RMI classes from the user. This approach maintains the Java object semantics such that the programmer can use remote objects just like normal Java objects. 3. Related work This section gives an overview of existing systems for distributed execution of Java applications. The goal of these systems is to gain increased computational power while preserving Java s parallel programming paradigm. In [3], distributed runtime systems are categorized into cluster-aware VMs, compiler-based DSM systems, and systems using standard JVMs. The first category consists of systems that use a non-standard JVM on each node to execute distributed applications. The most important examples of such systems are cjvm [2] and JESSICA2 [16]. Both approaches provide a complete single system image of a standard JVM. The advantage of using non-standard JVMs is increased efficiency due to the ability to access machine resources directly rather than through the JVM. A weakness of such systems is their lack of cross-platform compatibility. cjvm aims at virtualizing a cluster and at obtaining high performance for regular Java applications. A number of optimization techniques are used to address caching, locality of execution and object placement. The smart proxy mechanism of cjvm can be used as framework to implement different locality protocols. Currently, cjvm is unable to use a standard JIT compiler and does not implement a custom one. JESSICA2 applies transparent Java thread migration to multi-threaded Java applications. The migration mechanism allows distributing threads among cluster nodes at runtime. To support shared object access, a global object space has been implemented. The system includes some important features, e.g. load balancing through thread migration, an adaptive home-migration protocol, and a custom JIT compiler. Other systems compile the source or class files of a Java application into native machine code. Both Hyperion [1] and Jackal [15] support standard Java and do not change its programming paradigm. The usage of a custom source or byte code compiler has the disadvantage that such a compiler must continually be adapted to changes of the Java language specification. The advantage of compiler-based systems is their increased performance because of compiler optimizations and direct access to system resources. Hyperion offers an infrastructure for heterogeneous clusters providing the illusion of a single JVM. The original Java threads are mapped onto native system threads which are spread across the processing nodes to provide load balancing. The Java memory model is implemented by a DSM protocol, so the original semantics of the Java language is kept unchanged. To achieve portability, the Hyperion platform has been built on top of a portable runtime environment which supports various networks and communication interfaces. Jackal is a DSM system for Java which consists of an optimizing compiler and a runtime system. In combination with compiler optimizations, Jackal applies various runtime optimizations to increase locality and manage large data structures. The runtime system includes a distributed garbage collector and provides thread and object location transparency. While most systems use standard JVMs, only a few of them preserve the standard Java programming paradigm. Examples for such systems are JavaSymphony [4] and ADAJ [5]. Using standard 150

3 JVMs has the advantage that such systems can use heterogeneous nodes which locally optimize their performance using a JIT compiler. The main disadvantage of such systems is their relatively slow access to system resources. JavaSymphony is a programming environment for distributed and parallel computing that exploits heterogeneous resources. In order to use JavaSymphony efficiently, the programmer has to explicitly control data locality and load balancing. The structure of the computing resources has to be defined manually. Since all objects must be created, mapped, and freed explicitly, the handling of remote objects can be quite cumbersome. JavaSymphony does not offer assistance for those manual steps, so the semiautomatic distribution is likely to be error-prone. ADAJ is an environment for the development and execution of distributed Java applications. ADAJ is designed on top of JavaParty and is therefore most closely related to our work. The ADAJ project deals with placement and migration of Java objects. It automatically deploys parallel Java applications on a cluster of workstations by monitoring the application behavior. ADAJ contains a load-balancing mechanism that considers changes in the evolution of the application. While the focus of ADAJ is to balance the load between the individual JVMs, we concentrate on optimizing the locality of the distributed application. 4. Design 4.1. Locality optimizations Philippsen and Haumacher proposed locality optimizations in JavaParty by means of static type analysis [11]. They classify approaches to deal with locality in parallel object-oriented languages in three categories: (i) let the programmer specify placement and migration explicitly by means of annotations, (ii) static object distribution where the compiler tries to predict the best node for a new object, and (iii) dynamic object distribution based on a runtime system that keeps track of the call graph. JavaParty already provides mechanisms for manual object placement and migration, so we focus on static and dynamic object distribution in the following Static object distribution. Although a Java thread cannot migrate, the control flow (called activity in the following) can: when a method of a remote object is invoked, the activity conceptually leaves the JVM of the caller and is continued at the callee s JVM where it competes with other activities. Due to timeslicing and blocking, competing activities on one JVM decrease the total parallelism. Additional costs are introduced by the remote method invocation itself because of communication latency and bandwidth limitations. Thus, the general distribution strategy must be activity-centered: different activities should be placed onto different JVMs. Objects should be colocated to activities such that method invocation is local. Local method invocation avoids network communication and competing activities. Haumacher proposes an iterative procedure [6] to assign objects to activities and then activities to virtual machines. Based on a static type analysis, estimates for two values are derived: work(t, a) describes the computing time that activity t spends on methods of object a, and cost(t, a) describes the communication time that would be necessary if t and a were not located in the same address space. Through the placement of object a, the computing time of that activity t should be maximized in which address space a is created. At the same time, the sum of communication cost that is required for those activities t i assigned to remote virtual machines should be minimized. We assume an initial setting where all objects are located in a single address space with a single processor such that all method calls are local. In order to distribute objects to activities, we suppose that each activity is running in a different address space with its own processor. By placing object a in the address space of activity t, method calls of a by t can be executed parallel to other activities. Thus, work(t, a) indicates the time that is gained by the placement of a within the address space of t. The communication cost that other activities t i spend to access methods of a break even if work(t, a) is greater than the sum of cost(t i, a). So each object a can be mapped to an activity t in which address space it should be placed: activity(a) = t t maximizes(work(t,a) ti t cost(t,a)) Since usually more activities are used than virtual machines are available, several activities must share a virtual machine. Thus, it is necessary to identify groups of activities that should be executed on a shared virtual machine. The parallelization win of each activity can be estimated by mapping each object to its optimal activity. The parallelization win is computed by the sum of work(t, a) for objects a which reside in the address space of activity t minus the sum of cost(t, b) for objects b that are placed remotely: win(t) = {a activity(a) = t} work(t,a) {b activity(b) t} cost(t, b) i 151

4 The sum of work(t, a) represents the computing time that activity t spends in its own address space. This work is done in parallel to other activities if no synchronization mechanisms are used. The time that is spent for communication with other address spaces is represented by the sum of cost(t, b) for all objects b that are not assigned to activity t. Note that we charge the cost of a remote call to the activity that invoked the remote method, not to the activity that actually executes the method call. Activities are assigned to the available virtual machines in decreasing order of their parallelization wins until a single activity has been scheduled to each virtual machine. For each remaining activity, a new parallelization win is computed that accounts for the potential co-location with other activities. The activity is assigned to that group of activities with the highest combined parallelization win. This process is repeated until all activities are scheduled to their optimal virtual machine. The result of the distribution analysis is a mapping of each remote object to the virtual machine on which it should be placed Dynamic object distribution. While Philippsen and Haumacher focus on static object distribution through type analysis, we rely on dynamic object distribution to improve locality. This approach is reported to have two disadvantages: First, there is no knowledge about future call graphs as well as invocation frequencies. Second, the creation of objects that cannot migrate often results in a broad redistribution of other objects. The first problem is inherent to dynamic approaches, but can be softened by using heuristics to predict future behavior. The second problem is not exactly an issue in homogeneous cluster environments and can be handled by avoiding cyclic redistributions of remote objects. Besides these problems, the dynamic approach has the essential advantage that instead of estimating the values of work and cost, they can be measured: we take work as the actual execution time of a method call and cost as the communication time of a remote method invocation. As detailed later, we have to estimate the cost of remote calls that are actually executed locally because the called object resides on the same node. We adapt Haumacher s approach and use an iterative procedure to distribute objects to activities and then assign activities to virtual machines. Objects are migrated to the virtual machine their optimal activity is assigned to Time measurements Having developed a placement methodology for remote objects, we now focus on how to measure the time values required for the distribution algorithm. Beginning with the Pentium processor, Intel allows the programmer to access a time-stamp counter [8]. This counter keeps an accurate count of every cycle that occurs on the processor since it is incremented every clock cycle, starting with zero. To access the counter, programmers can use the RDTSC (read time-stamp counter) instruction. We use the counter to get an time estimate for the duration of method invocations. Note that the time-stamp counter measures cycles, not time. Thus, comparing cycle counts only makes sense on processors of the same speed like in a homogeneous cluster environment. To compare processors of different speeds, the cycle counts should be converted into time units. While the unit of time returned by currenttimemillis() is a millisecond, the granularity of the value depends on the underlying OS and may be larger. Thus, the timestamp counter also allows much finer measurements. To avoid measurement errors because of concurrency, we assume that the workstations of the cluster are used exclusively for JavaParty. In the presence of background jobs, cycle counting does not always reflect the real execution time of an application. But in the long run, the interrupts through background jobs are approximately the same for all workstations of a homogenous cluster. Thus, we assume that those interrupts balance over time such that cycle counting actually reflects the average execution time Remote Method Invocation RMI uses a standard mechanism for communicating with remote objects stubs and skeletons. A stub for a remote object acts as a local representative or proxy for the remote object. The stub hides the serialization of parameters and the network communication whereas the skeleton is responsible for dispatching the call to the actual remote object implementation. We want to measure work(t, a) and cost(t, a) in order to apply the distribution algorithm. In the context of stubs and skeletons, work corresponds to the time that the actual method implementation takes and cost corresponds to the time that is required for carrying out the remote call, i.e. marshaling and transmitting parameters and result. For remote object r, a stub is instantiated on each node while only one skeleton is instantiated on the node where the implementation of r resides. That is, 152

5 there are n stubs and one skeleton for each remote object. Basically, our approach is to measure the communication time of a remote call in the stub and the execution time of the implementation in the skeleton by using the RDTSC instruction. We store aggregated work and cost values in the skeleton. 5. Implementation 5.1. Time measurements Our framework for performance measuring wraps the RDTSC instruction described in the previous chapter using the Java Native Interface [13]. As detailed in Table 1, accessing the system time is orders of magnitude more expensive than using the RDTSC instruction. Times were measured on a Pentium III 800 MHz system. Table 1. Cost of System.currentTimeMillis() Call Cycles Time RDTSC.readccounter() µs System.currentTimeMillis() µs 5.2. KaRMI KaRMI [12] is a fast replacement for Java RMI. It is based on an efficient object serialization mechanism that replaces regular Java serialization. Since the remote method invocation protocol is different from Java RMI, the format of stubs and skeletons is different, too. The KaRMI compiler generates stub and skeleton classes from compiled remote classes. We modified the generation of stubs and skeletons to include code that measures the execution times of remote calls. The measured times are processed by the distribution task to compute an optimal object distribution. More precisely, we modified the generation of stubs to measure the total execution time of remote calls. Once a remote call returns, the stub sends the total time to the skeleton which measured the execution time of the actual implementation (i.e. work). Using both values, we compute cost as the difference between the total time and work. In order to transmit the total time from stub to skeleton, we added methods to send and receive the measured times to the client and server side of the connection. These methods are called after a remote method invocation has been completed and the result is marshaled back to the caller. Finally, the work and cost values are stored in the skeleton using a special data structure described later Estimation of cost An important optimization carried out by JavaParty is that a call is only executed remotely if the called object actually resides on another node. Otherwise, the call is executed locally. Recall that cost(t, a) estimates the communication time that would be necessary if activity t was not located on the same node as object a. While we re able to measure the actual communication time of remote calls, we have to estimate the cost of local calls as if they were remote. Thus, we have to develop a model to estimate the communication cost based on the measured cost of a local call. Whenever the client and server objects are in the same address space, arguments and result are cloned to preserve the copy semantics of a remote call. JavaParty produces a deep clone with all referenced objects also being cloned. In the generated stubs, the instrumented version of the local short cut measures the cost of cloning arguments and return value. The measuring can be divided into three parts: cloning of the arguments, local method invocation, and cloning of the result. Based on the measured local cost of cloning arguments and result, we estimate the communication cost if the call was remote. For this purpose we analyzed the results of a benchmark suite that measures the execution times of local and remote method calls for a representative set of parameter types. Given the duration of a local call, we estimate how long a remote call takes. While the absolute values are likely to vary on different machines, the relation between local and remote calls should approximately be the same. For simplicity, we assume a linear model with offset a and gradient b: remote cost = a + b (local cost) We applied a nonlinear least-squares algorithm to the results of the benchmark suite in order to fit the estimate function and determine the values of a and b Smoothing and storing time values We use a hash map to store time values, mapping activities to work and cost values. JavaParty assigns a globally unique thread id to activities that face remote calls. If a new measurement is to be stored, the given thread id is mapped to a pair of work and cost values. We store these values directly with the skeleton, so the addressed object is implicit. Since work and cost 153

6 indicate the computing and communication times an activity spends on all methods of an object, we have to aggregate the values of the individual methods in a reasonable way. We use an exponential moving average which has the following advantages over simply adding up the time values: First, the weighting for each data point decreases exponentially, giving more importance to recent observations while still not discarding older observations. Second, the weighting makes our measurement more robust against outliers, e.g. delayed execution because of distributed garbage-collection. Third, the exponential moving average is easy to compute and thus a relatively cheap operation Application monitoring JavaParty offers an interface that allows plugging in additional classes that can be used for monitoring the distributed environment. In our case, the monitor interface is implemented as an invisible task that collects runtime data based on instrumentation. This data is used to analyze the distribution of remote objects over the virtual machines. In JavaParty, references to remote objects are stored in a distributed fashion. Thus, we have to iterate over all virtual machines to obtain references to the remote objects. These references are used to collect the measured times. The monitor also serves as front end for the distribution task which can either be scheduled for repeated fixed-delay execution or invoked manually via a library call. Basically, our distribution task fetches the measured times and runs the distribution algorithm discussed in section 4.1. The distribution algorithm sorts the application threads according to their parallelization wins. Each activity is assigned to a group of activities which are optimally placed on the same virtual machine. Finally, each object is assigned to its optimal JVM and possibly migrated there. The migration succeeds only for objects that are not declared to be resident. If nothing was changed during the migration, the distribution task is canceled. 6. Evaluation In order to evaluate the effectiveness and efficiency of our work, we examined two applications that have potential for locality optimizations. If a program was already distributed optimally at compile time and its locality did not change during run time, there would be nothing to optimize. The first application is a numerical algorithm that has a static structure. We started with a sub-optimal distribution and optimized its locality during runtime. The second application is an n-body simulation with an inherently dynamic structure. We started with an optimal distribution and adapted the locality as the structure of the application changed. All measurements in this chapter have been conducted on our Carla cluster, using the Java Server VM 1.4.2_13-b06. This cluster consists of 16 nodes equipped with two Pentium III 800 MHz processors and 1 GB RAM each Successive over-relaxation Successive over-relaxation is a numerical algorithm for solving Laplace equations on a grid. The sequential implementation involves an outer loop for the iterations and two inner loops, each looping over the grid. During an iteration, the new value of each point of the grid is determined by calculating the average value of the four neighbor points. The algorithm terminates if no point of the grid has changed more than a certain threshold. The parallel implementation [9] provided by Maassen is based on a red-black ordering mechanism. The grid is partitioned among the available processors, each processor receiving a number of adjacent rows. Before a processor starts to update the points of a certain color, it exchanges the border rows of the opposite color with its neighbors. time [ms] x1000 grid, 300 iterations # machines manual optimized random Figure 1. Results of the SOR benchmark The SOR benchmark performs 300 iterations of successive over-relaxation on a 1000x1000 grid of double values. The performance was measured on 2, 4, 8, and 16 nodes and is reported in milliseconds per iteration. In order to evaluate our approach, we created three versions of the benchmark: (i) a manual version that creates all remote objects at their optimal location, (ii) a random version where the location of the remote 154

7 objects is determined randomly, and (iii) an optimized version which invokes the locality optimizations after the first iteration based on the random object distribution. The results of the SOR benchmark are shown in Figure 1. As expected, the manual version performs best with a constant speedup as the number of machines increases. The random version performs worst and does not scale with additional machines. Finally, the optimized version of the benchmark performs considerably better than the random version, improving its performance towards the optimal version. If more iterations were performed, the optimized version would do even better since the cost of the locality optimizations would bear less weight. Figure 1 might give the impression that the optimized version does not scale with additional machines. This is not exactly true since the cost of the locality optimizations is proportional to the number of nodes, too. Table 2 details the cost of the procedure for the SOR benchmark. Polling the remote objects clearly dominates the overall cost. In spite of its square complexity, the cost of the distribution algorithm is relatively small. Again, if the number of iterations was increased or a benchmark with longer processing times was used, the cost would decrease. The benchmark performs 10 iterations of n-body simulation with 1000 particles. The performance was measured on 2, 4, 8, and 16 nodes and is reported in seconds per iteration. Again, we created three versions of the benchmark: (i) a manual version with explicit placement annotations, (ii) a random version where the location of the remote objects is determined randomly, and (iii) an optimized version which invokes the locality optimizations after the first iteration based on the random object distribution. Figure 2 shows the results of the n-body benchmark. Because of the dynamic structure of the benchmark, an optimal distribution of the remote objects is hard to predict and depends on the spatial distribution of the particles. As the initial coordinates of the particles are determined randomly and thus are not known a priori, the manual version of the benchmark performs only slightly better than the random version. Since the locality of the application is adapted to the actual location of the particles, the optimized version of the benchmark performs best. The cost of the locality optimizations can easily be covered by the savings achieved during the following iterations. Table 2. Cost of the locality optimizations polling remote objects computing locality algorithm cost of migrating objects overall cost [ms] , , , ,30 time [s] particles, 10 iterations manual optimized random 6.3. N-body simulation # machines The n-body simulation approximates the movement of n particles in a two-dimensional space based on mutual gravitation. The simulation is discretized into time steps where the gravity between each of the n particles must be computed for each time step. Afterwards, acceleration and change in velocity and location are determined for each particle. In order to avoid the square complexity of computing forces, the present implementation uses an approximation proposed by Barnes and Hut. Through hierarchical grouping and generation of substitute masses for distant space regions, the computation complexity is reduced to O(n log(n)) operations per time step. We refer to [7] for a detailed description of the benchmark. Figure 2. Results of the n-body benchmark The n-body benchmark is a good example for the effectiveness of our approach. In dynamic settings such as the n-body simulation, it is hard and sometimes impossible to determine a good initial distribution of the remote objects. Even if an optimal distribution can be determined, the performance of the initial distribution will decrease since the locality of the application changes. Only a dynamic approach that optimizes the locality at runtime can guarantee consistently high performance throughout the whole life cycle of the application. 155

8 7. Conclusion and future work In this work, we presented runtime locality optimizations of distributed Java applications. Based on a static approach, we developed a dynamic methodology to automatically generate a distribution strategy for the objects of a distributed system. We instrumented stubs and skeletons to measure the execution time and communication cost of remote calls. The measured time values are stored locally to avoid communication overhead. The locality optimizations are implemented as a task that runs periodically or can be started on demand. This task collects the measured time values and computes an optimal distribution strategy. In order to realize the distribution strategy, objects are migrated between machines. We evaluated the effectiveness and efficiency of our work by optimizing two benchmark applications. The first benchmark is a typical example of a numerical algorithm with a static structure, so we created a random initial distribution of the objects and optimized their locality at runtime. The second benchmark has a dynamic structure, so that the performance of the initial object distribution even of an optimal one will deteriorate at runtime. We have shown that our approach is particularly suitable for such dynamic settings. In future work, we will focus on automatically adapting the periodic time of the distribution task such that it reflects the processing time of the application. If the structure of the application does not change, we might even want to switch off the measuring completely. For large clusters with thousands of processors or applications with a great number of objects, an algorithm with square complexity might be suboptimal. We could imagine a distributed algorithm that works with exact time values for only a couple of local nodes and extrapolates the values for remote nodes. References [1] G. Antoniu, L. Bouge, P. Hatcher, M. MacBeth, K. McGuigan, and R. Namyst, The Hyperion system: Compiling multithreaded Java bytecode for distributed execution, Parallel Computing, [2] Y. Aridor, M. Factor, and A. Teperman, cjvm: a single system image of a JVM on a cluster, Parallel Processing, 1999, pp [3] M. Factor, A. Schuster, and K. Shagin, A distributed runtime for Java: yesterday and today, Parallel and Distributed Processing Symposium, [4] T. Fahringer, JavaSymphony: a system for development of locality-oriented distributed and parallel Java applications, Cluster Computing, [5] V. Felea, R. Olejnik, and B. Toursel, ADAJ: a Java Distributed Environment for Easy Programming Design and Efficient Execution, Shedae Informaticae, UJ Press, Krakow, 2004, pp [6] B. Haumacher, Lokalitätsoptimierung durch statische Typanalyse in JavaParty, Diploma theses, Institute for Program Structures and Data Organization, University of Karlsruhe, January [7] B. Haumacher, Plattformunabhängige Umgebung für verteilt paralleles Rechnen mit Rechnerbündeln, PhD thesis, Institute for Program Structures and Data Organization, University of Karlsruhe, October [8] Intel Corp, Using the RDTSC Instruction for Performance Monitoring, SCPM1.HTM [9] J. Maassen and R.V. Nieuwpoort, Fast parallel Java, Master's thesis, Dept. of Computer Science, Vrije Universiteit, Amsterdam, August [10] M. Philippsen and M. Zenger, JavaParty - Transparent Remote Objects in Java, Concurrency: Practice and Experience, November [11] M. Philippsen and B. Haumacher, Locality optimization in JavaParty by means of static type analysis, Proc. Workshop on Java for High Performance Network Computing at EuroPar '98, Southhampton, September [12] M. Philippsen, B. Haumacher, and C. Nester, More Efficient Serialization and RMI for Java, Concurrency: Practice and Experience, John Wiley & Sons, Chichester, West Sussix, May 2000, pp [13] Sun Microsystems, Java Native Interface, [14] Sun Microsystems, Java Remote Method Invocation Specification, OC.html [15] R. Veldema, R. A. F. Bhoedjang, and H. E. Bal, Jackal, a compiler based implementation of java for clusters of workstations, Proceedings of PPoPP, [16] W. Zhu, C.-L. Wang, and F. C. M. Lau, JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support, IEEE Fourth International Conference on Cluster Computing, Chicago, USA, September

Global Accessible Objects (GAOs) in the Ambicomp Distributed Java Virtual Machine

Global Accessible Objects (GAOs) in the Ambicomp Distributed Java Virtual Machine Global Accessible Objects (GAOs) in the Ambicomp Distributed Java Virtual Machine Bjoern Saballus University of Karlsruhe Department of Computer Science System Architecture Group Am Fasanengarten 5 76131

More information

JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support

JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support Wenzhang Zhu, Cho-Li Wang, and Francis C. M. Lau Department of Computer Science and Information Systems The University

More information

Load balancing in SOAJA (Service Oriented Java Adaptive Applications)

Load balancing in SOAJA (Service Oriented Java Adaptive Applications) Load balancing in SOAJA (Service Oriented Java Adaptive Applications) Richard Olejnik Université des Sciences et Technologies de Lille Laboratoire d Informatique Fondamentale de Lille (LIFL UMR CNRS 8022)

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

CEJVM: Cluster Enabled Java Virtual Machine

CEJVM: Cluster Enabled Java Virtual Machine CE: Cluster Enabled Java Virtual Machine M.U. Janjua, M. Yasin, F. Sher, K. Awan, I. Hassan Faculty of Computer Science & Engineering, GIK Institute, Topi, Pakistan umar_janjua@yahoo.com, mmyasin@giki.edu.pk

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

Object Instance Profiling

Object Instance Profiling Object Instance Profiling Lubomír Bulej 1,2, Lukáš Marek 1, Petr Tůma 1 Technical report No. 2009/7, November 2009 Version 1.0, November 2009 1 Distributed Systems Research Group, Department of Software

More information

A Review of Customized Dynamic Load Balancing for a Network of Workstations

A Review of Customized Dynamic Load Balancing for a Network of Workstations A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester

More information

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General

More information

Analysis and Modeling of MapReduce s Performance on Hadoop YARN

Analysis and Modeling of MapReduce s Performance on Hadoop YARN Analysis and Modeling of MapReduce s Performance on Hadoop YARN Qiuyi Tang Dept. of Mathematics and Computer Science Denison University tang_j3@denison.edu Dr. Thomas C. Bressoud Dept. of Mathematics and

More information

Load balancing in the SOAJA Web Service Platform

Load balancing in the SOAJA Web Service Platform Proceedings of the International Multiconference on Computer Science and Information Technology pp. 49 46 ISBN 978-83-6080-4-9 ISSN 896-7094 Load balancing in the SOAJA Web Service Platform Richard Olejnik,

More information

ADAJ: a Java Distributed Environment for Easy Programming Design and Efficient Execution

ADAJ: a Java Distributed Environment for Easy Programming Design and Efficient Execution S C H E D A E I N F O R M A T I C A E VOLUME 13 2004 ADAJ: a Java Distributed Environment for Easy Programming Design and Efficient Execution Violeta Felea 1,2, Richard Olejnik 1, Bernard Toursel 1,2 1

More information

Replication on Virtual Machines

Replication on Virtual Machines Replication on Virtual Machines Siggi Cherem CS 717 November 23rd, 2004 Outline 1 Introduction The Java Virtual Machine 2 Napper, Alvisi, Vin - DSN 2003 Introduction JVM as state machine Addressing non-determinism

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

Introduction to DISC and Hadoop

Introduction to DISC and Hadoop Introduction to DISC and Hadoop Alice E. Fischer April 24, 2009 Alice E. Fischer DISC... 1/20 1 2 History Hadoop provides a three-layer paradigm Alice E. Fischer DISC... 2/20 Parallel Computing Past and

More information

It is the thinnest layer in the OSI model. At the time the model was formulated, it was not clear that a session layer was needed.

It is the thinnest layer in the OSI model. At the time the model was formulated, it was not clear that a session layer was needed. Session Layer The session layer resides above the transport layer, and provides value added services to the underlying transport layer services. The session layer (along with the presentation layer) add

More information

Validating Java for Safety-Critical Applications

Validating Java for Safety-Critical Applications Validating Java for Safety-Critical Applications Jean-Marie Dautelle * Raytheon Company, Marlborough, MA, 01752 With the real-time extensions, Java can now be used for safety critical systems. It is therefore

More information

A Case for Dynamic Selection of Replication and Caching Strategies

A Case for Dynamic Selection of Replication and Caching Strategies A Case for Dynamic Selection of Replication and Caching Strategies Swaminathan Sivasubramanian Guillaume Pierre Maarten van Steen Dept. of Mathematics and Computer Science Vrije Universiteit, Amsterdam,

More information

The Design of the Inferno Virtual Machine. Introduction

The Design of the Inferno Virtual Machine. Introduction The Design of the Inferno Virtual Machine Phil Winterbottom Rob Pike Bell Labs, Lucent Technologies {philw, rob}@plan9.bell-labs.com http://www.lucent.com/inferno Introduction Virtual Machine are topical

More information

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Client/Server Computing Distributed Processing, Client/Server, and Clusters Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the

More information

Load balancing using Remote Method Invocation (JAVA RMI)

Load balancing using Remote Method Invocation (JAVA RMI) Load balancing using Remote Method Invocation (JAVA RMI) Ms. N. D. Rahatgaonkar 1, Prof. Mr. P. A. Tijare 2 1 Department of Computer Science & Engg and Information Technology Sipna s College of Engg &

More information

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do

More information

Efficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration

Efficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration Efficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration 1 Harish H G, 2 Dr. R Girisha 1 PG Student, 2 Professor, Department of CSE, PESCE Mandya (An Autonomous Institution under

More information

A Pattern-Based Approach to. Automated Application Performance Analysis

A Pattern-Based Approach to. Automated Application Performance Analysis A Pattern-Based Approach to Automated Application Performance Analysis Nikhil Bhatia, Shirley Moore, Felix Wolf, and Jack Dongarra Innovative Computing Laboratory University of Tennessee (bhatia, shirley,

More information

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging Memory Management Outline Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging 1 Background Memory is a large array of bytes memory and registers are only storage CPU can

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Distributed Particle Simulation Method on Adaptive Collaborative System

Distributed Particle Simulation Method on Adaptive Collaborative System Distributed Particle Simulation Method on Adaptive Collaborative System Yudong Sun, Zhengyu Liang, and Cho-Li Wang Department of Computer Science and Information Systems The University of Hong Kong, Pokfulam

More information

Symbol Tables. Introduction

Symbol Tables. Introduction Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The

More information

Analysis of Micromouse Maze Solving Algorithms

Analysis of Micromouse Maze Solving Algorithms 1 Analysis of Micromouse Maze Solving Algorithms David M. Willardson ECE 557: Learning from Data, Spring 2001 Abstract This project involves a simulation of a mouse that is to find its way through a maze.

More information

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department

More information

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL This chapter is to introduce the client-server model and its role in the development of distributed network systems. The chapter

More information

address space of JVM B address space of JVM A address space of JVM B address space of JVM A invoke remotely invoke return return

address space of JVM B address space of JVM A address space of JVM B address space of JVM A invoke remotely invoke return return To appear in: Concurrency: Practice & Experience, Vol. 11, 1999. Locality optimization in JavaParty by means of static type analysis Michael Philippsen and Bernhard Haumacher Computer Science Department,

More information

Components for Operating System Design

Components for Operating System Design Components for Operating System Design Alan Messer and Tim Wilkinson SARC, City University, London, UK. Abstract Components are becoming used increasingly in the construction of complex application software.

More information

A Java-based system support for distributed applications on the Internet

A Java-based system support for distributed applications on the Internet A Java-based system support for distributed applications on the Internet D. Hagimont 1, D. Louvegnies 2 SIRAC Project INRIA, 655 av. de l Europe, 38330 Montbonnot Saint-Martin, France Abstract: We have

More information

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program CS 2112 Lecture 27 Interpreters, compilers, and the Java Virtual Machine 1 May 2012 Lecturer: Andrew Myers 1 Interpreters vs. compilers There are two strategies for obtaining runnable code from a program

More information

BSC vision on Big Data and extreme scale computing

BSC vision on Big Data and extreme scale computing BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age. Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement

More information

Using Predictive Adaptive Parallelism to Address Portability and Irregularity

Using Predictive Adaptive Parallelism to Address Portability and Irregularity Using Predictive Adaptive Parallelism to Address Portability and Irregularity avid L. Wangerin and Isaac. Scherson {dwangeri,isaac}@uci.edu School of Computer Science University of California, Irvine Irvine,

More information

From Control Loops to Software

From Control Loops to Software CNRS-VERIMAG Grenoble, France October 2006 Executive Summary Embedded systems realization of control systems by computers Computers are the major medium for realizing controllers There is a gap between

More information

Load Balancing in Distributed Data Base and Distributed Computing System

Load Balancing in Distributed Data Base and Distributed Computing System Load Balancing in Distributed Data Base and Distributed Computing System Lovely Arya Research Scholar Dravidian University KUPPAM, ANDHRA PRADESH Abstract With a distributed system, data can be located

More information

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra January 2014 Legal Notices Apache Cassandra, Spark and Solr and their respective logos are trademarks or registered trademarks

More information

Implementing Java Distributed Objects with JDBC

Implementing Java Distributed Objects with JDBC Implementing Java Distributed Objects with JDBC Pritisha 1, Aashima Arya 2 1,2 Department of Computer Science Bhagwan Mahaveer institute of engineering & technology (BMIET), Deenbandhu Chhotu Ram University

More information

The Java Series. Java Essentials I What is Java? Basic Language Constructs. Java Essentials I. What is Java?. Basic Language Constructs Slide 1

The Java Series. Java Essentials I What is Java? Basic Language Constructs. Java Essentials I. What is Java?. Basic Language Constructs Slide 1 The Java Series Java Essentials I What is Java? Basic Language Constructs Slide 1 What is Java? A general purpose Object Oriented programming language. Created by Sun Microsystems. It s a general purpose

More information

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

Agenda. Distributed System Structures. Why Distributed Systems? Motivation Agenda Distributed System Structures CSCI 444/544 Operating Systems Fall 2008 Motivation Network structure Fundamental network services Sockets and ports Client/server model Remote Procedure Call (RPC)

More information

Java Virtual Machine: the key for accurated memory prefetching

Java Virtual Machine: the key for accurated memory prefetching Java Virtual Machine: the key for accurated memory prefetching Yolanda Becerra Jordi Garcia Toni Cortes Nacho Navarro Computer Architecture Department Universitat Politècnica de Catalunya Barcelona, Spain

More information

Load Balancing in the Cloud Computing Using Virtual Machine Migration: A Review

Load Balancing in the Cloud Computing Using Virtual Machine Migration: A Review Load Balancing in the Cloud Computing Using Virtual Machine Migration: A Review 1 Rukman Palta, 2 Rubal Jeet 1,2 Indo Global College Of Engineering, Abhipur, Punjab Technical University, jalandhar,india

More information

The Efficiency Analysis of the Object Oriented Realization of the Client-Server Systems Based on the CORBA Standard 1

The Efficiency Analysis of the Object Oriented Realization of the Client-Server Systems Based on the CORBA Standard 1 S C H E D A E I N F O R M A T I C A E VOLUME 20 2011 The Efficiency Analysis of the Object Oriented Realization of the Client-Server Systems Based on the CORBA Standard 1 Zdzis law Onderka AGH University

More information

1 Organization of Operating Systems

1 Organization of Operating Systems COMP 730 (242) Class Notes Section 10: Organization of Operating Systems 1 Organization of Operating Systems We have studied in detail the organization of Xinu. Naturally, this organization is far from

More information

Cloud Computing. Up until now

Cloud Computing. Up until now Cloud Computing Lecture 11 Virtualization 2011-2012 Up until now Introduction. Definition of Cloud Computing Grid Computing Content Distribution Networks Map Reduce Cycle-Sharing 1 Process Virtual Machines

More information

Cellular Computing on a Linux Cluster

Cellular Computing on a Linux Cluster Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results

More information

How To Balance In Cloud Computing

How To Balance In Cloud Computing A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi johnpm12@gmail.com Yedhu Sastri Dept. of IT, RSET,

More information

16.1 MAPREDUCE. For personal use only, not for distribution. 333

16.1 MAPREDUCE. For personal use only, not for distribution. 333 For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several

More information

An Agent-Based Infrastructure for Parallel Java on Heterogeneous Clusters

An Agent-Based Infrastructure for Parallel Java on Heterogeneous Clusters An Agent-Based Infrastructure for Parallel Java on Heterogeneous Clusters Jameela Al-Jaroodi, Nader Mohamed, Hong Jiang, and David Swanson Department of Computer Science and Engineering University of Nebraska

More information

How To Write A Network Operating System For A Network (Networking) System (Netware)

How To Write A Network Operating System For A Network (Networking) System (Netware) Otwarte Studium Doktoranckie 1 Adaptable Service Oriented Architectures Krzysztof Zieliński Department of Computer Science AGH-UST Krakow Poland Otwarte Studium Doktoranckie 2 Agenda DCS SOA WS MDA OCL

More information

JPURE - A PURIFIED JAVA EXECUTION ENVIRONMENT FOR CONTROLLER NETWORKS 1

JPURE - A PURIFIED JAVA EXECUTION ENVIRONMENT FOR CONTROLLER NETWORKS 1 JPURE - A PURIFIED JAVA EXECUTION ENVIRONMENT FOR CONTROLLER NETWORKS 1 Danilo Beuche, Lars Büttner, Daniel Mahrenholz, Wolfgang Schröder-Preikschat, Friedrich Schön* University of Magdeburg * GMD-FIRST

More information

A New Distributed Java Virtual Machine for Cluster Computing

A New Distributed Java Virtual Machine for Cluster Computing A New Distributed Java Virtual Machine for Cluster Computing Marcelo Lobosco 1, Anderson F. Silva 1, Orlando Loques 2 and Claudio L. de Amorim 1 1 Laboratório de Computação Paralela, Programa de Engenharia

More information

An Overview of CORBA-Based Load Balancing

An Overview of CORBA-Based Load Balancing An Overview of CORBA-Based Load Balancing Jian Shu, Linlan Liu, Shaowen Song, Member, IEEE Department of Computer Science Nanchang Institute of Aero-Technology,Nanchang, Jiangxi, P.R.China 330034 dylan_cn@yahoo.com

More information

11.1 inspectit. 11.1. inspectit

11.1 inspectit. 11.1. inspectit 11.1. inspectit Figure 11.1. Overview on the inspectit components [Siegl and Bouillet 2011] 11.1 inspectit The inspectit monitoring tool (website: http://www.inspectit.eu/) has been developed by NovaTec.

More information

Overview of CORBA 11.1 I NTRODUCTION TO CORBA. 11.4 Object services 11.5 New features in CORBA 3.0 11.6 Summary

Overview of CORBA 11.1 I NTRODUCTION TO CORBA. 11.4 Object services 11.5 New features in CORBA 3.0 11.6 Summary C H A P T E R 1 1 Overview of CORBA 11.1 Introduction to CORBA 11.2 CORBA architecture 11.3 Client and object implementations 11.4 Object services 11.5 New features in CORBA 3.0 11.6 Summary In previous

More information

Load Balancing In Concurrent Parallel Applications

Load Balancing In Concurrent Parallel Applications Load Balancing In Concurrent Parallel Applications Jeff Figler Rochester Institute of Technology Computer Engineering Department Rochester, New York 14623 May 1999 Abstract A parallel concurrent application

More information

Naming vs. Locating Entities

Naming vs. Locating Entities Naming vs. Locating Entities Till now: resources with fixed locations (hierarchical, caching,...) Problem: some entity may change its location frequently Simple solution: record aliases for the new address

More information

Managing Variability in Software Architectures 1 Felix Bachmann*

Managing Variability in Software Architectures 1 Felix Bachmann* Managing Variability in Software Architectures Felix Bachmann* Carnegie Bosch Institute Carnegie Mellon University Pittsburgh, Pa 523, USA fb@sei.cmu.edu Len Bass Software Engineering Institute Carnegie

More information

Introduction to Virtual Machines

Introduction to Virtual Machines Introduction to Virtual Machines Introduction Abstraction and interfaces Virtualization Computer system architecture Process virtual machines System virtual machines 1 Abstraction Mechanism to manage complexity

More information

Chapter 18: Database System Architectures. Centralized Systems

Chapter 18: Database System Architectures. Centralized Systems Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

Cloud Based Distributed Databases: The Future Ahead

Cloud Based Distributed Databases: The Future Ahead Cloud Based Distributed Databases: The Future Ahead Arpita Mathur Mridul Mathur Pallavi Upadhyay Abstract Fault tolerant systems are necessary to be there for distributed databases for data centers or

More information

MOSIX: High performance Linux farm

MOSIX: High performance Linux farm MOSIX: High performance Linux farm Paolo Mastroserio [mastroserio@na.infn.it] Francesco Maria Taurino [taurino@na.infn.it] Gennaro Tortone [tortone@na.infn.it] Napoli Index overview on Linux farm farm

More information

Chapter 6. CORBA-based Architecture. 6.1 Introduction to CORBA 6.2 CORBA-IDL 6.3 Designing CORBA Systems 6.4 Implementing CORBA Applications

Chapter 6. CORBA-based Architecture. 6.1 Introduction to CORBA 6.2 CORBA-IDL 6.3 Designing CORBA Systems 6.4 Implementing CORBA Applications Chapter 6. CORBA-based Architecture 6.1 Introduction to CORBA 6.2 CORBA-IDL 6.3 Designing CORBA Systems 6.4 Implementing CORBA Applications 1 Chapter 6. CORBA-based Architecture Part 6.1 Introduction to

More information

Task Scheduling in Hadoop

Task Scheduling in Hadoop Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed

More information

Fachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert

Fachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert Ubiquitous Computing Ubiquitous Computing The Sensor Network System Sun SPOT: The Sun Small Programmable Object Technology Technology-Based Wireless Sensor Networks a Java Platform for Developing Applications

More information

Report of the case study in Sistemi Distribuiti A simple Java RMI application

Report of the case study in Sistemi Distribuiti A simple Java RMI application Report of the case study in Sistemi Distribuiti A simple Java RMI application Academic year 2012/13 Vessio Gennaro Marzulli Giovanni Abstract In the ambit of distributed systems a key-role is played by

More information

Language Based Virtual Machines... or why speed matters. by Lars Bak, Google Inc

Language Based Virtual Machines... or why speed matters. by Lars Bak, Google Inc Language Based Virtual Machines... or why speed matters by Lars Bak, Google Inc Agenda Motivation for virtual machines HotSpot V8 Dart What I ve learned Background 25+ years optimizing implementations

More information

On Performance of Delegation in Java

On Performance of Delegation in Java On Performance of Delegation in Java Sebastian Götz Software Technology Group, Dresden University of Technology, Germany sebastian.goetz@mail.inf.tu-dresden.de Mario Pukall Database Research Group, Otto-von-Guericke-University

More information

Jini. Kurzfassung als Kapitel für die Vorlesung Verteilte Systeme. (unter Nutzung von Teilen von Andreas Zeidler und Roger Kehr)

Jini. Kurzfassung als Kapitel für die Vorlesung Verteilte Systeme. (unter Nutzung von Teilen von Andreas Zeidler und Roger Kehr) Jini Kurzfassung als Kapitel für die Vorlesung Verteilte Systeme Friedemann Mattern (unter Nutzung von Teilen von Andreas Zeidler und Roger Kehr) Jini Infrastructure ( middleware ) for dynamic, cooperative,

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Java Real-Time Distributed Processing over Chorus/OS

Java Real-Time Distributed Processing over Chorus/OS Java Real-Time Distributed Processing over Chorus/OS Christophe Lizzi CS Technologies Informatiques lizzi@csti.fr CNAM, CEDRIC lab. lizzi@cnam.fr Outline Motivations Our vision Real-time Java Operating

More information

Performance Comparison of Server Load Distribution with FTP and HTTP

Performance Comparison of Server Load Distribution with FTP and HTTP Performance Comparison of Server Load Distribution with FTP and HTTP Yogesh Chauhan Assistant Professor HCTM Technical Campus, Kaithal Shilpa Chauhan Research Scholar University Institute of Engg & Tech,

More information

Multi-GPU Load Balancing for Simulation and Rendering

Multi-GPU Load Balancing for Simulation and Rendering Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks

More information

Compiling Object Oriented Languages. What is an Object-Oriented Programming Language? Implementation: Dynamic Binding

Compiling Object Oriented Languages. What is an Object-Oriented Programming Language? Implementation: Dynamic Binding Compiling Object Oriented Languages What is an Object-Oriented Programming Language? Last time Dynamic compilation Today Introduction to compiling object oriented languages What are the issues? Objects

More information

Chapter 2: OS Overview

Chapter 2: OS Overview Chapter 2: OS Overview CmSc 335 Operating Systems 1. Operating system objectives and functions Operating systems control and support the usage of computer systems. a. usage users of a computer system:

More information

Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications

Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications Rouven Kreb 1 and Manuel Loesch 2 1 SAP AG, Walldorf, Germany 2 FZI Research Center for Information

More information

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) PARALLEL JAVASCRIPT Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) JAVASCRIPT Not connected with Java Scheme and self (dressed in c clothing) Lots of design errors (like automatic semicolon

More information

Preserving Message Integrity in Dynamic Process Migration

Preserving Message Integrity in Dynamic Process Migration Preserving Message Integrity in Dynamic Process Migration E. Heymann, F. Tinetti, E. Luque Universidad Autónoma de Barcelona Departamento de Informática 8193 - Bellaterra, Barcelona, Spain e-mail: e.heymann@cc.uab.es

More information

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden smakadir@csc.kth.se,

More information

Hardware/Software Co-Design of a Java Virtual Machine

Hardware/Software Co-Design of a Java Virtual Machine Hardware/Software Co-Design of a Java Virtual Machine Kenneth B. Kent University of Victoria Dept. of Computer Science Victoria, British Columbia, Canada ken@csc.uvic.ca Micaela Serra University of Victoria

More information

CS 575 Parallel Processing

CS 575 Parallel Processing CS 575 Parallel Processing Lecture one: Introduction Wim Bohm Colorado State University Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5

More information

gprof: a Call Graph Execution Profiler 1

gprof: a Call Graph Execution Profiler 1 gprof A Call Graph Execution Profiler PSD:18-1 gprof: a Call Graph Execution Profiler 1 by Susan L. Graham Peter B. Kessler Marshall K. McKusick Computer Science Division Electrical Engineering and Computer

More information

JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing

JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing January 2014 Legal Notices JBoss, Red Hat and their respective logos are trademarks or registered trademarks of Red Hat, Inc. Azul

More information

Building Scalable Applications Using Microsoft Technologies

Building Scalable Applications Using Microsoft Technologies Building Scalable Applications Using Microsoft Technologies Padma Krishnan Senior Manager Introduction CIOs lay great emphasis on application scalability and performance and rightly so. As business grows,

More information

Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment

Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment Nuno A. Carvalho, João Bordalo, Filipe Campos and José Pereira HASLab / INESC TEC Universidade do Minho MW4SOC 11 December

More information

Introduction CORBA Distributed COM. Sections 9.1 & 9.2. Corba & DCOM. John P. Daigle. Department of Computer Science Georgia State University

Introduction CORBA Distributed COM. Sections 9.1 & 9.2. Corba & DCOM. John P. Daigle. Department of Computer Science Georgia State University Sections 9.1 & 9.2 Corba & DCOM John P. Daigle Department of Computer Science Georgia State University 05.16.06 Outline 1 Introduction 2 CORBA Overview Communication Processes Naming Other Design Concerns

More information

Technical Research Paper. Performance tests with the Microsoft Internet Security and Acceleration (ISA) Server

Technical Research Paper. Performance tests with the Microsoft Internet Security and Acceleration (ISA) Server Technical Research Paper Performance tests with the Microsoft Internet Security and Acceleration (ISA) Server Author: Martin Eisermann Date: 2002-05-13 City: Bad Aibling, Germany Annotations: This research

More information

Various Schemes of Load Balancing in Distributed Systems- A Review

Various Schemes of Load Balancing in Distributed Systems- A Review 741 Various Schemes of Load Balancing in Distributed Systems- A Review Monika Kushwaha Pranveer Singh Institute of Technology Kanpur, U.P. (208020) U.P.T.U., Lucknow Saurabh Gupta Pranveer Singh Institute

More information

Web Services. Copyright 2011 Srdjan Komazec

Web Services. Copyright 2011 Srdjan Komazec Web Services Middleware Copyright 2011 Srdjan Komazec 1 Where are we? # Title 1 Distributed Information Systems 2 Middleware 3 Web Technologies 4 Web Services 5 Basic Web Service Technologies 6 Web 2.0

More information

Performance Measurement of Dynamically Compiled Java Executions

Performance Measurement of Dynamically Compiled Java Executions Performance Measurement of Dynamically Compiled Java Executions Tia Newhall and Barton P. Miller University of Wisconsin Madison Madison, WI 53706-1685 USA +1 (608) 262-1204 {newhall,bart}@cs.wisc.edu

More information

Real-Time Monitoring Framework for Parallel Processes

Real-Time Monitoring Framework for Parallel Processes International Journal of scientific research and management (IJSRM) Volume 3 Issue 6 Pages 3134-3138 2015 \ Website: www.ijsrm.in ISSN (e): 2321-3418 Real-Time Monitoring Framework for Parallel Processes

More information

PERFORMANCE MONITORING OF JAVA COMPONENT-ORIENTED DISTRIBUTED APPLICATIONS

PERFORMANCE MONITORING OF JAVA COMPONENT-ORIENTED DISTRIBUTED APPLICATIONS PERFORMANCE MONITORING OF JAVA COMPONENT-ORIENTED DISTRIBUTED APPLICATIONS Adrian Mos, John Murphy Performance Engineering Lab, Dublin City University Glasnevin, Dublin 9, Ireland Tel: +353 1 700-8762,

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information