DISK: A Distributed Java Virtual Machine For DSP Architectures
|
|
|
- Bridget Fitzgerald
- 10 years ago
- Views:
Transcription
1 DISK: A Distributed Java Virtual Machine For DSP Architectures Mihai Surdeanu Southern Methodist University, Computer Science Department [email protected] Abstract Java is for sure the most popular programming language today. Its popularity is based on the portability of the code and on the elegant programming framework provided by the language: built in garbage collection and multi-threaded support, easy Internet application development through socket streams, and last but not least familiar syntax. Even if a lot of work has already been done related to Java Virtual Machines (JVMs) and a large number of Java applications are already available, few of these have penetrated the DSP world. Our work aims to bridge this gap between DSP architectures and Java. Such a connection would open the DSP world to a much larger class of consumers, a place now occupied by general-purpose processors. The two main research directions we see in improving the performance of a JVM for DSP architectures are: (1) writing a Java bytecode Just-In-Time compiler optimized for a specific DSP platform, and (2), taking advantage of the available parallelism present in DSP multi-processor architectures. Our work focuses on the last issue. We propose a distributed JVM designed for a multi-processor DSP architecture. Due to the unavailability of the hardware at the moment, we have implemented our ideas on a network of general-purpose processors, emulating the topology of the quad-processor DSP board constructed by Innovative DSP. We emulate the I/O channels connecting the processors through TCP sockets and interrupts with Unix signals. All the protocols use a constant number of messages, making it possible to estimate the costs of all distributed actions. In this document we show the design and test results obtained with DISK. While a distributed JVM architecture may not prove to be an exciting project for highly-specialized DSP processors (such as TI's TMS32 family), we believe that this work may be successfully applied to multiprocessor architectures based on RISC processors enhanced with signal processing capabilities (such as the new ARM9E processor). Other projects are working on porting Java to this class of processors. On top of this we provide a Java parallel-processing platform that binds the power of Java distributed computing with signal processing. 1
2 1 Introduction Java is slowly starting to penetrate the DSP world. Both JDK [3] and Kaffe [5] have ports available for ARM processors. Nevertheless, the world of DSP multiprocessing remains unexplored by Java environments or applications. There are JVMs for multiprocessor architectures built with general-purpose processors such as Solaris or NT SMPs, but none for DSP multiprocessors. The reason is not only the fact that Java ports have been available for DSP architectures only recently but also the fact that most (if not all) DSP multiprocessor systems lack shared memory. The C6X multiprocessor boards built by BlueWave [1] and Innovative Integration [2] have separate memories for each processor on board. Inter processor communication (if available) is performed through dedicated I/O channels. Hence, even if the processors are tightly connected, we believe that these architectures are better described as multicomputers, where the inter processor communication is performed through message passing not shared memory. Another possible architecture fitting the same classification is a cluster of ARM processors connected by a high-speed network. The Java Virtual Machine Specification (JVMS) [4] makes the assumption that shared memory is available. Once this assumption is no longer true, as is the case with DSP multiprocessors, one needs a distributed shared memory protocol to provide the illusion of shared memory. It is possible to show that under the assumption that programs are data-race free, the JVMS is equivalent with Release Consistency [6], a well-known consistency model for distributed shared memories. It is beyond the purpose of this paper to show this equivalence. Instead, we describe a distributed JVM architecture built upon the Release Consistency model, named DISK (DIStributed Kaffe). As the name implies, we built our system on top of the Kaffe Virtual Machine (we used version 1..b3). DISK provides a distributed Java environment, where threads are transparently allocated to processors based on a customizable processor allocation algorithm. The object heap is kept consistent using one of the four protocols implementing Release Consistency. DISK is thoroughly implemented at the JVM level. We do not require any changes to the Java language or compiler. The DISK architecture and protocols are described in section 2. DISK is a distributed-sharedmemory system implemented from scratch to take advantage of the Java unique features (objectoriented, just-in-time compiling). Section 3 presents the test results for three benchmark applications. We test the performance of four protocols implementing Release Consistency. Such a study is not yet available for the Java language. Our results indicate that there is no clear winner, each protocol performing best under specific circumstances. This indicates that a runtime adaptive protocol might be the best solution overall. Section 4 concludes the paper. 2 Architecture In this section we describe the DISK architecture and the protocols used to implement the Release Consistency model. We show the block diagram of a DISK node in figure 2.1. As shown in the figure we use the Kaffe VM to execute Java bytecode. The interaction between Kaffe and DISK's distributed environment is done through software barriers. Release Consistency is maintained over the cluster of workstations through the protocols implemented in the RC Protocols module. The object, lock, and thread management module is present only on master processors. This module tracks the location of every global object and lock, and the processor where every thread is allocated. We currently use one master processor per cluster, but the management tasks could be easily distributed to a larger number of processors. The communication layer is in charge of the input and output communication channel management. We have currently implemented the 2
3 communication layer on top of TCP, but it could easily be ported over any reliable communication protocol. The RC-compliant protocols and the management module communicate in a topology-independent way through the services offered by a light Remote Procedure Call (RPC) protocol. Java Thread Java Thread Kaffe VM Software Barriers RC Protocols Object/Lock/Thread Management Light RPC Communication Layer TCP Figure 2.1: Block diagram of a DISK node DISK is implemented over a bus-connected cluster of workstations. On top of this, the communication layer constructs software point to point channels between all nodes. The topology obtained was highly influenced by the quad-processor C6x board designed by Innovative Integration. This topology is depicted in figure 2.2 for a system with four processors Net Figure 2.2: Point-to-point-connected system with centralized I/O Note that only processor (the master processor) has full I/O access. This is because the master processor runs also a file server that provides I/O transparency for the Java threads. To reduce initialization overhead we provide partial I/O capabilities to the slave processors to allow them to independently load Java classes at startup. This design choice yielded an approximate two times faster system initialization compared to the system where all I/O accesses go through the master processor. On a tightly connected DSP multiprocessor board, the Java default classes could be stored inside processor-local ROM for speedup (Kaffe s classes fit in a 5 KB buffer). The block diagram of the communication layer is presented in figure
4 Communication Layer Buffer Pool Dispatcher Threads Worker Threads Client Threads Input Channels Output Channels Each DISK node supports three types of threads: Figure 2.3: DISK communication layer Dispatcher threads - Each input channel is monitored by a dispatcher thread. The dispatchers are dormant most of the time. They are awaken by the thread system whenever a packet arrives on the monitored channel. The dispatcher reads the packet from the channel into a memory buffer. If the message contained in the packet is a reply, the thread waiting for it is awaken. If the message is a request, a worker thread is designated to serve the request. Worker threads - Each processor has a pool of worker threads. A worker is created when a request waits to be serviced and the worker pool is empty. When a worker finishes the assigned request it returns to the pool where it waits for the next job. Client threads - Client threads are created by the Java application. For any program at least one client thread is created to run the main method. To support this organization and to avoid excessive dynamic allocation at runtime (garbage collection is expensive), a pool of buffers is created at system initialization and dynamically extended according to the application needs. The dispatcher threads use these buffers when incoming packets are read from the I/O channels or by any other thread that needs to build a packet. The buffer returns to the buffer pool when it is not needed any more. Having in mind a DSP environment, we made the assumption that virtual memory is not available, hence we do not use the virtual memory management unit to implement the consistency protocol. Instead we modify Kaffe's Just-In-Time (JIT) compiler to call our own consistency functions before any read or write access to object data. We modified the JIT code for the following bytecode instructions: PUTFIELD to insert a write barrier before object accesses; GETFIELD to insert a read barrier before object accesses; IASTORE, LASTORE, FASTORE, DASTORE, AASTORE, BASTORE, CASTORE, SASTORE to insert write barriers before array accesses; IALOAD, LALOAD, FALOAD, DALOAD, AALOAD, BALOAD, CALOAD, SALOAD to insert read barriers before array accesses. In the current DISK version we do not change the PUTSTATIC and GETSTATIC instructions, hence we do not offer consistency support for class (static) data. Due to the software barriers the semantics of the above instructions, plus MONITOR_ENTER, MONITOR_EXIT, and thread creation are changed. To keep things familiar for the reader not used with the Java instruction set, we rename the instructions that require read barriers as READ, the instructions that require write barriers as WRITE, MONITOR_ENTER and 4
5 MONITOR_EXIT as LOCK and UNLOCK. We also use the name TCREATE to indicate the thread create operation. The semantics of these instructions are presented in figure 2.4. M M (1) (2) (4) (1) (2) S1 S2 (3) S1 (3) S2 (a) (b) M (1) (2) S1 S2 (3) (c) M (1) (2) (3) S1 S2 (d) S3 Figure 2.4: Distributed semantics for bytecode instructions. (a) READ; (b) WRITE; (c) LOCK; (d) TCREATE In figure 2.4 we denote with M the master processor, with S1 the (slave) processor trying to access a shared resource, and with S2 the processor having the resource. The semantics for READ and LOCK are obvious. During a WRITE operation we have to make sure that the object location recorded by the master processor is protected. Hence, message (1) locks the object record on the master processor, and message (4) unlocks it after the new location of the object is recorded. Message (1) of the TCREATE operation is sent to the master processor in response to a call to java.lang.thread.start(), the Java native method that starts a new thread. The master processor allocates S3 as the host processor of this new thread, based on some processor allocation algorithm (currently we use a round-robin allocation algorithm). Message (2) is sent to the processor S2 currently owning the Java Thread object corresponding to the new thread. Message (3) containing a copy of this object is sent to processor S3 where the new thread is created. Note that no reply is expected. Processor S1 resumes execution as soon as message (1) is successfully delivered. Note that the operations described in figure 2.4 apply only to shared objects not available locally. DISK is intelligent enough to detect if an object is not shared (i.e. if it is accessed only by threads located on the same processor), in which case the software barrier is greatly simplified. DISK provides Release Consistency (RC), a JVMS compliant consistency model, for shared objects. An informal description of RC is that the changes to objects guarded by locks are sent to remote processors only when the lock is released. We refer the reader interested in a formal definition of RC to [6]. As of today there are two main flavors of RC: Eager Release Consistency (ERC) which forces the propagation of changes during the UNLOCK operation, and Lazy Release Consistency (LRC) [7], which delays sending the changes to shared objects until the time of the next LOCK operation. [7] shows that these two models are equivalent for programs where all conflicting accesses to shared objects are protected by synchronization mechanisms (data-race free programs). Just like cache coherence protocols, both ERC and LRC can be invalidate-based (when only invalidation messages are sent) or update-based (when the modified data is appended to consistency messages). The protocols implementing RC may decide to maintain a single writable copy for each object (similar with cache coherence protocols), or allow multiple writable copies of the same object [8][1]. In the present stage we implement only single-writer protocols but we are planning to 5
6 investigate the performance of multiple-writer protocols as well. At this point, DISK supports the following protocols: LRC, single writer, invalidate LRC, single writer, update ERC, single writer, invalidate ERC, single writer, update The performance of these protocols is analyzed in section 3. 3 Test Results We have tested the performance of the protocols introduced in section 2 on the following three benchmark applications: PMC This is a producer multiple consumer problem. The producer repeatedly writes to a shared object and blocks until all consumers have read the resource. The tests presented here have been performed for 8 consumers and 1 iterations. JCB This is an implementation of the Jacobi method for solving partial differential equations. The problem input is a two-dimensional matrix. During each iteration, every element is updated to the average of its nearest neighbors. Each Java thread gets a number of adjacent rows to work on. These tests have been performed on a 24x24 matrix, with 8 threads and 25 iterations. MM This is a parallel matrix multiplication algorithm, where each element in the output matrix is computed by a separate thread. We used 2x2 matrices for this application. Note that these applications are not written to take advantage of the available parallelism. Instead, the purpose is to stress the system by using a large number of synchronization points, in order to evaluate the relative performance of the proposed protocols. Due to these features, these benchmarks do not scale well when the number of processors is increased. We performed the tests on 2, 3 and 4 processors. Nevertheless, we believe that these programs are fit to test the relative performance of the protocols introduced in section 2. Even if these applications are simple, they span a large class of real life applications: PMC uses a small number of shared objects, but they are repeatedly accessed. JCB as well has a small number of shared objects, but their size is much larger than the average Java object size (each row is a separate object). MM uses a large number of shared objects (each matrix element is here a separate object), but the shared object size is much smaller than the average object size. The results obtained are presented in figure 3.1.a to 3.1.e for PMC, 3.2.a to 3.2.e for JCB, and 3.3.a to 3.3.e for MM. For each application we measured the additional memory needed by the protocol, the execution time, the number and size of packets sent throughout the execution, and the number and size of shared objects and locks versus the total number and size of all objects and locks. 6
7 Percent of Heap Space Protocol Additional Memory.4%.2%.% Lazy Eager Seconds 1 5 Execution Time Figure 3.1.a: Protocol additional memory (PMC) Packet Count Figure 3.1.b: Execution time (PMC) Packet Data 2 1 KB Figure 3.1.c: Packet Count (PMC) Figure 3.1.d: Packet Data (PMC) Total Objects Total Locks Shared Objects Shared Locks Count Data Count Data Count Data Count Data Figure 3.1.e: Shared versus total objects/locks (PMC) Percent of Heap Space Protocol Additional Memory 5.%.% Lazy Eager Seconds Execution Time Figure 3.2.a: Protocol additional memory (JCB) Figure 3.2.b: Execution time (JCB) 7
8 Packet Count Packet Data KB 1 5 Figure 3.2.c: Packet count (JCB) Figure 3.2.d: Packet data (JCB) Total Objects Total Locks Shared Objects Shared Locks Count Data Count Data Count Data Count Data Figure 3.2.e: Shared versus total objects/locks (JCB) Percent of Heap Space Protocol Additional Memory 3.% 2.% 1.%.% Lazy Eager Seconds Execution Time Figure 3.3.a: Protocol additional memory (MM) Packet Count Figure 3.3.b: Execution time (MM) Packet Data KB Figure 3.3.c: Packet count (MM) Figure 3.3.d: Packet data (MM) Total Objects Total Locks Shared Objects Shared Locks Count Data Count Data Count Data Count Data 8
9 Figure 3.3.e: Shared versus total objects/locks (MM) PMC uses few shared objects hence the additional memory required by all protocols is negligible (see figure 3.1.a). The interesting result for this benchmark are the execution times presented in figure 3.1.b. The update protocols (both and ) perform much better than their invalidate equivalent protocols. The explanation is based on the size of the shared objects (see figure 3.1.e). For PMC, the average shared object size ranges between 7 and 8 bytes. Even though this is about two times larger than the average Java object size reported in [12], it is still small. Hence, as figure 3.1.b and 3.1.c show, appending object data to consistency messages reduces the execution time by reducing the number of consistency messages sent. Obviously, the amount of data sent by the update protocols is larger than the invalidate protocols, but the difference is small, as figure 3.1.d shows. An analysis of figures 3.1.c and 3.1.d shows that update protocols reduce the number of messages with the expense of sending slightly more data. This indicates that update protocols are better fit for systems where the message transfer overhead is large (i.e. TCP), hence one would like to reduce the message count as much as possible. On the other hand, invalidate protocols are fit for systems where the main concern is the amount of data to transfer, where the transfer initialization overhead is negligible (i.e. DSP processors tightly-connected on the same board). Nevertheless, update protocols must be used with care. As seen in figure 3.2.d, for JCB where the average shared object size is large (roughly 1 KB), update protocols end up sending at least five times more data, even though the message count is less than the message count for invalidate protocols. As seen in figure 3.2.b, this yields execution times almost two times longer for the update protocols. This suggests that probably the best performance would be achieved by an adaptive protocol, a protocol capable to decide between update or invalidate based on the object size. We are planning to investigate such a protocol in the near future. Figure 3.3.b shows a relatively close performance of all four protocols for the MM benchmark. The troubling result for this application is the large protocol overhead (almost 25% of the heap space). This is caused by the large number of very small size shared object (each element in the output matrix is a word-sized shared object). Having a large number of small size objects is the worst scenario for DISK, since every shared object is managed separately. Every shared object is allocated a separate header to store needed information such as the object state, the system-wide id, and queue of threads waiting for a valid copy of this object. This adds a significant overhead if the object size is small. While we are working to reduce the shared object header, these results are also an indication for the programmer. A wise programmer should reduce the number of shared objects (i.e. objects referred from a Thread object) to a minimum. Another interesting observation is the fact that the ERC protocols, generally discarded as having a worse performance than LRC, are successfully competing with LRC protocols. In two out of three applications, ERC protocols outperform their equivalent LRC protocols. This can be explained by the relatively uniform distribution of shared objects accesses in all studied applications. Hence, broadcasting changes to all processors when locks are released, the way ERC protocols work, proves to be a better solution. 4 Conclusions We have introduced in this paper a distributed Java virtual machine architecture we named DISK. Currently DISK is implemented on a cluster of general purpose processors, but it is immediately 9
10 portable to DSP architectures. All the hardware and operating system requirements needed by DISK are already presented on DSP platforms: we use the Kaffe virtual machine (already ported on ARM processors) to execute Java bytecode. We do not use the virtual memory management unit to implement the consistency protocol. Instead we modified Kaffe s just-in-time compiler to insert calls to our consistency functions before any object access. All these features make DISK portable to virtually any architecture. DISK implements the Release Consistency model, a memory consistency model equivalent with the Java virtual machine specification. DISK currently supports four protocols implementing Release Consistency. The analysis of this protocols shows that, unlike page-based distributed shared memories, for DISK there is no clear winner. Different factors, like the speed of the interconnection network, the average shared object size, or object distribution, favor one protocol over the other. This suggests that an adaptive system capable to dynamically switch between protocols may be the best solution. References 1. Blue Wave Systems Innovative Integration ARM The Java Virtual Machine Specification The Kaffe Project K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In Proceedings of the Seventeenth International Symposium on Computer Architecture, May P. Keleher. Distributed Shared Memory Using Lazy Release Consistency. Ph.D. Thesis, Rice University, C. Amza, A. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamany, W. Yu, and W. Zwaenopoel. TreadMarks: Shared Memory Computing on Networks of Workstations. In IEEE Computer, vol. 29, No. 2, February C. Amza, A. Cox, S. Dwarkadas, L.J. Jin, R. Rajamany, and W. Zwaenopoel. Adaptive Protocols for Software Distributed Shared Memory. In Proceedings of IEEE, special issue on distributed shared memory, March J.B. Carter. Design of the Munin Distributed Shared Memory System. In Journal of Parallel and Distributed Computing, special issue on distributed shared memory, A.S. Tanenbaum. Distributed Operating Systems. Prentice-Hall, S. Dieckemann and U. Holzle. A Study of the Allocation Behavior of the SPECjvm98 Java Benchmarks. UCSB Technical Report TRC98-33, December
A New Distributed Java Virtual Machine for Cluster Computing
A New Distributed Java Virtual Machine for Cluster Computing Marcelo Lobosco 1, Anderson F. Silva 1, Orlando Loques 2 and Claudio L. de Amorim 1 1 Laboratório de Computação Paralela, Programa de Engenharia
Chapter 1 - Web Server Management and Cluster Topology
Objectives At the end of this chapter, participants will be able to understand: Web server management options provided by Network Deployment Clustered Application Servers Cluster creation and management
Java Virtual Machine: the key for accurated memory prefetching
Java Virtual Machine: the key for accurated memory prefetching Yolanda Becerra Jordi Garcia Toni Cortes Nacho Navarro Computer Architecture Department Universitat Politècnica de Catalunya Barcelona, Spain
Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest
Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest 1. Introduction Few years ago, parallel computers could
The Effect of Contention on the Scalability of Page-Based Software Shared Memory Systems
The Effect of Contention on the Scalability of Page-Based Software Shared Memory Systems Eyal de Lara, Y. Charlie Hu, Alan L. Cox, and Willy Zwaenepoel Department of Electrical and Computer Engineering
Chapter 18: Database System Architectures. Centralized Systems
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
Control 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
Replication on Virtual Machines
Replication on Virtual Machines Siggi Cherem CS 717 November 23rd, 2004 Outline 1 Introduction The Java Virtual Machine 2 Napper, Alvisi, Vin - DSN 2003 Introduction JVM as state machine Addressing non-determinism
Multi-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
Client/Server Computing Distributed Processing, Client/Server, and Clusters
Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the
How To Understand The Concept Of A Distributed System
Distributed Operating Systems Introduction Ewa Niewiadomska-Szynkiewicz and Adam Kozakiewicz [email protected], [email protected] Institute of Control and Computation Engineering Warsaw University of
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Vorlesung Rechnerarchitektur 2 Seite 178 DASH
Vorlesung Rechnerarchitektur 2 Seite 178 Architecture for Shared () The -architecture is a cache coherent, NUMA multiprocessor system, developed at CSL-Stanford by John Hennessy, Daniel Lenoski, Monica
CSCI E 98: Managed Environments for the Execution of Programs
CSCI E 98: Managed Environments for the Execution of Programs Draft Syllabus Instructor Phil McGachey, PhD Class Time: Mondays beginning Sept. 8, 5:30-7:30 pm Location: 1 Story Street, Room 304. Office
Garbage Collection in the Java HotSpot Virtual Machine
http://www.devx.com Printed from http://www.devx.com/java/article/21977/1954 Garbage Collection in the Java HotSpot Virtual Machine Gain a better understanding of how garbage collection in the Java HotSpot
Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures
Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do
JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support
JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support Wenzhang Zhu, Cho-Li Wang, and Francis C. M. Lau Department of Computer Science and Information Systems The University
Principles and characteristics of distributed systems and environments
Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single
Chapter 6, The Operating System Machine Level
Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Winter Term 2008 / 2009 Jun.-Prof. Dr. André Brinkmann [email protected] Universität Paderborn PC² Agenda Multiprocessor and
High Performance Cluster Support for NLB on Window
High Performance Cluster Support for NLB on Window [1]Arvind Rathi, [2] Kirti, [3] Neelam [1]M.Tech Student, Department of CSE, GITM, Gurgaon Haryana (India) [email protected] [2]Asst. Professor,
Distributed Systems LEEC (2005/06 2º Sem.)
Distributed Systems LEEC (2005/06 2º Sem.) Introduction João Paulo Carvalho Universidade Técnica de Lisboa / Instituto Superior Técnico Outline Definition of a Distributed System Goals Connecting Users
Chapter 11 I/O Management and Disk Scheduling
Operatin g Systems: Internals and Design Principle s Chapter 11 I/O Management and Disk Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles An artifact can
Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:
Tools Page 1 of 13 ON PROGRAM TRANSLATION A priori, we have two translation mechanisms available: Interpretation Compilation On interpretation: Statements are translated one at a time and executed immediately.
Chapter 2: Remote Procedure Call (RPC)
Chapter 2: Remote Procedure Call (RPC) Gustavo Alonso Computer Science Department Swiss Federal Institute of Technology (ETHZ) [email protected] http://www.iks.inf.ethz.ch/ Contents - Chapter 2 - RPC
Distributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
The Oracle Universal Server Buffer Manager
The Oracle Universal Server Buffer Manager W. Bridge, A. Joshi, M. Keihl, T. Lahiri, J. Loaiza, N. Macnaughton Oracle Corporation, 500 Oracle Parkway, Box 4OP13, Redwood Shores, CA 94065 { wbridge, ajoshi,
Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings
Operatin g Systems: Internals and Design Principle s Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Bear in mind,
A Review of Customized Dynamic Load Balancing for a Network of Workstations
A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester
Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra
Chapter 3: Operating-System Structures. Common System Components
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System Design and Implementation System Generation 3.1
Delivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
Symmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
Cellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
Azul Compute Appliances
W H I T E P A P E R Azul Compute Appliances Ultra-high Capacity Building Blocks for Scalable Compute Pools WP_ACA0209 2009 Azul Systems, Inc. W H I T E P A P E R : A z u l C o m p u t e A p p l i a n c
Computer Organization & Architecture Lecture #19
Computer Organization & Architecture Lecture #19 Input/Output The computer system s I/O architecture is its interface to the outside world. This architecture is designed to provide a systematic means of
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
Network Attached Storage. Jinfeng Yang Oct/19/2015
Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
Google File System. Web and scalability
Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might
Evaluating HDFS I/O Performance on Virtualized Systems
Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang [email protected] University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing
A Locally Cache-Coherent Multiprocessor Architecture
A Locally Cache-Coherent Multiprocessor Architecture Kevin Rich Computing Research Group Lawrence Livermore National Laboratory Livermore, CA 94551 Norman Matloff Division of Computer Science University
Chapter 3 Operating-System Structures
Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual
Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
Transparent Redirection of Network Sockets 1
Transparent Redirection of Network Sockets 1 Timothy S. Mitrovich, Kenneth M. Ford, and Niranjan Suri Institute for Human & Machine Cognition University of West Florida {tmitrovi,kford,nsuri}@ai.uwf.edu
Agenda. Distributed System Structures. Why Distributed Systems? Motivation
Agenda Distributed System Structures CSCI 444/544 Operating Systems Fall 2008 Motivation Network structure Fundamental network services Sockets and ports Client/server model Remote Procedure Call (RPC)
TECHNOLOGY BRIEF. Compaq RAID on a Chip Technology EXECUTIVE SUMMARY CONTENTS
TECHNOLOGY BRIEF August 1999 Compaq Computer Corporation Prepared by ISSD Technology Communications CONTENTS Executive Summary 1 Introduction 3 Subsystem Technology 3 Processor 3 SCSI Chip4 PCI Bridge
Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis
Middleware and Distributed Systems Introduction Dr. Martin v. Löwis 14 3. Software Engineering What is Middleware? Bauer et al. Software Engineering, Report on a conference sponsored by the NATO SCIENCE
picojava TM : A Hardware Implementation of the Java Virtual Machine
picojava TM : A Hardware Implementation of the Java Virtual Machine Marc Tremblay and Michael O Connor Sun Microelectronics Slide 1 The Java picojava Synergy Java s origins lie in improving the consumer
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? Ashutosh Shinde Performance Architect [email protected] Validating if the workload generated by the load generating tools is applied
System Structures. Services Interface Structure
System Structures Services Interface Structure Operating system services (1) Operating system services (2) Functions that are helpful to the user User interface Command line interpreter Batch interface
Validating Java for Safety-Critical Applications
Validating Java for Safety-Critical Applications Jean-Marie Dautelle * Raytheon Company, Marlborough, MA, 01752 With the real-time extensions, Java can now be used for safety critical systems. It is therefore
Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
LOAD BALANCING AS A STRATEGY LEARNING TASK
LOAD BALANCING AS A STRATEGY LEARNING TASK 1 K.KUNGUMARAJ, 2 T.RAVICHANDRAN 1 Research Scholar, Karpagam University, Coimbatore 21. 2 Principal, Hindusthan Institute of Technology, Coimbatore 32. ABSTRACT
A Java-based system support for distributed applications on the Internet
A Java-based system support for distributed applications on the Internet D. Hagimont 1, D. Louvegnies 2 SIRAC Project INRIA, 655 av. de l Europe, 38330 Montbonnot Saint-Martin, France Abstract: We have
TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance
TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance M. Rangarajan, A. Bohra, K. Banerjee, E.V. Carrera, R. Bianchini, L. Iftode, W. Zwaenepoel. Presented
JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers
JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers Dave Jaffe, PhD, Dell Inc. Michael Yuan, PhD, JBoss / RedHat June 14th, 2006 JBoss Inc. 2006 About us Dave Jaffe Works for Dell
Cloud Computing. Up until now
Cloud Computing Lecture 11 Virtualization 2011-2012 Up until now Introduction. Definition of Cloud Computing Grid Computing Content Distribution Networks Map Reduce Cycle-Sharing 1 Process Virtual Machines
Lecture 25 Symbian OS
CS 423 Operating Systems Design Lecture 25 Symbian OS Klara Nahrstedt Fall 2011 Based on slides from Andrew S. Tanenbaum textbook and other web-material (see acknowledgements) cs423 Fall 2011 1 Overview
Configuring Nex-Gen Web Load Balancer
Configuring Nex-Gen Web Load Balancer Table of Contents Load Balancing Scenarios & Concepts Creating Load Balancer Node using Administration Service Creating Load Balancer Node using NodeCreator Connecting
Agility Database Scalability Testing
Agility Database Scalability Testing V1.6 November 11, 2012 Prepared by on behalf of Table of Contents 1 Introduction... 4 1.1 Brief... 4 2 Scope... 5 3 Test Approach... 6 4 Test environment setup... 7
Fachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert
Ubiquitous Computing Ubiquitous Computing The Sensor Network System Sun SPOT: The Sun Small Programmable Object Technology Technology-Based Wireless Sensor Networks a Java Platform for Developing Applications
Operating System Tutorial
Operating System Tutorial OPERATING SYSTEM TUTORIAL Simply Easy Learning by tutorialspoint.com tutorialspoint.com i ABOUT THE TUTORIAL Operating System Tutorial An operating system (OS) is a collection
ELEC 377. Operating Systems. Week 1 Class 3
Operating Systems Week 1 Class 3 Last Class! Computer System Structure, Controllers! Interrupts & Traps! I/O structure and device queues.! Storage Structure & Caching! Hardware Protection! Dual Mode Operation
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...
The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology
3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related
D1.2 Network Load Balancing
D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June [email protected],[email protected],
Hardware/Software Co-Design of a Java Virtual Machine
Hardware/Software Co-Design of a Java Virtual Machine Kenneth B. Kent University of Victoria Dept. of Computer Science Victoria, British Columbia, Canada [email protected] Micaela Serra University of Victoria
Computer Network. Interconnected collection of autonomous computers that are able to exchange information
Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.
Learning More About Load Testing
Welcome to this introduction to application performance testing and the LoadRunner load testing solution. This document provides a short overview of LoadRunner s features, and includes the following sections:
Interconnection Networks
Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture
A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi
COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Cognos8 Deployment Best Practices for Performance/Scalability. Barnaby Cole Practice Lead, Technical Services
Cognos8 Deployment Best Practices for Performance/Scalability Barnaby Cole Practice Lead, Technical Services Agenda > Cognos 8 Architecture Overview > Cognos 8 Components > Load Balancing > Deployment
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts
Chapter 2 Introduction to Distributed systems 1 Chapter 2 2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts Client-Server
Real-Time Scheduling 1 / 39
Real-Time Scheduling 1 / 39 Multiple Real-Time Processes A runs every 30 msec; each time it needs 10 msec of CPU time B runs 25 times/sec for 15 msec C runs 20 times/sec for 5 msec For our equation, A
Smart Queue Scheduling for QoS Spring 2001 Final Report
ENSC 833-3: NETWORK PROTOCOLS AND PERFORMANCE CMPT 885-3: SPECIAL TOPICS: HIGH-PERFORMANCE NETWORKS Smart Queue Scheduling for QoS Spring 2001 Final Report By Haijing Fang([email protected]) & Liu Tang([email protected])
InfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
11.1 inspectit. 11.1. inspectit
11.1. inspectit Figure 11.1. Overview on the inspectit components [Siegl and Bouillet 2011] 11.1 inspectit The inspectit monitoring tool (website: http://www.inspectit.eu/) has been developed by NovaTec.
Intel Data Direct I/O Technology (Intel DDIO): A Primer >
Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
Design and Implementation of Distributed Process Execution Environment
Design and Implementation of Distributed Process Execution Environment Project Report Phase 3 By Bhagyalaxmi Bethala Hemali Majithia Shamit Patel Problem Definition: In this project, we will design and
Data Management for Portable Media Players
Data Management for Portable Media Players Table of Contents Introduction...2 The New Role of Database...3 Design Considerations...3 Hardware Limitations...3 Value of a Lightweight Relational Database...4
How To Understand The History Of An Operating System
7 Operating Systems 7.1 Source: Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: 7.2 Understand the role of the operating system.
Virtual Machines. www.viplavkambli.com
1 Virtual Machines A virtual machine (VM) is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software
The Hadoop Distributed File System
The Hadoop Distributed File System The Hadoop Distributed File System, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo, 2010 Agenda Topic 1: Introduction Topic 2: Architecture
Resource Utilization of Middleware Components in Embedded Systems
Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system
Restraining Execution Environments
Restraining Execution Environments Segurança em Sistemas Informáticos André Gonçalves Contents Overview Java Virtual Machine: Overview The Basic Parts Security Sandbox Mechanisms Sandbox Memory Native
INTRODUCTION TO JAVA PROGRAMMING LANGUAGE
INTRODUCTION TO JAVA PROGRAMMING LANGUAGE Today Java programming language is one of the most popular programming language which is used in critical applications like stock market trading system on BSE,
Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters
Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.
An Oracle White Paper July 2012. Load Balancing in Oracle Tuxedo ATMI Applications
An Oracle White Paper July 2012 Load Balancing in Oracle Tuxedo ATMI Applications Introduction... 2 Tuxedo Routing... 2 How Requests Are Routed... 2 Goal of Load Balancing... 3 Where Load Balancing Takes
