Load Balancing MPI Algorithm for High Throughput Applications
|
|
|
- Scot Hilary Cannon
- 10 years ago
- Views:
Transcription
1 Load Balancing MPI Algorithm for High Throughput Applications Igor Grudenić, Stjepan Groš, Nikola Bogunović Faculty of Electrical Engineering and, University of Zagreb Unska 3, Zagreb, Croatia {igor.grudenic, stjepan.gros, Abstract. MPI (Message Passing Interface) is a standard API (Application Programming Interface) used to create distributed applications. Development of distributed MPI software is a challenging task in which many difficulties may arise. Common problems include load balancing, deadlocks and livelocks. In order to tackle complexity of parallel programming various methods are developed. In this paper we present a load balancing algorithm (LBA) that can be used instead of developing custom parallel application. The algorithm uses MPI transport mechanism for internal signaling and synchronization. The LBA is most suitable for the tasks that involve high and continuous data throughput. It performance will significantly degrade if used in complex computation topologies. Keywords. Distributed computing, MPI, load balancing, scheduling. 1. Introduction Distributed computing became more popular in the last decade due to advances in hardware and communication networks. The increase in computing power enabled very complex computations to be accomplished within reasonable time. In order to gain performance in distributed environment, traditional sequential algorithms must be redesigned. Some algorithms are inherently parallel or can be parallelized without compromising performance [2][10], while some other cannot benefit from distributed environment. There are two basic programming paradigms used in the development of distributed software: shared memory and message passing. The message passing model is much more scalable, but its exploitation raises issues such as deadlock and livelock occurrence, data corruption, and performance problems. There are a number of methods proposed to ease development of message passing applications. These approaches include parallelizing compilers [1], modeling and verification tools [11][5], performance analyzers [9], run-time checkers/analyzers [6][7] and parallel debuggers [13]. In this paper we provide a message passing algorithm (LBA) that can be generally used to schedule multiple small independent jobs with high data throughput. Typical applications include high volume data compression, video editing, encoding and decoding, 3d scene rendering, etc. Cluster schedulers dedicated for cluster resource management [4][8] are not suitable for this task because they operate on a large scale and are optimized for long jobs with moderate size of input and output data. Additionally, network utilization is not optimized by cluster schedulers and availability of a job input data is not optimized to match the availability of the computation resources. There is [12] scheduler designed for batch queuing targeted at high throughput of short batch jobs which is designed to work with shared high performance storage so it disregards the data availability issue. In section 2 we provide description of the LBA architecture, as well as possibilities for its expansion. LBA scheduling algorithm that implements part of the LBA architecture is presented in section 3. In section 4 we describe test scenario for the implemented algorithm together with performance results. 2. LBA Architecture LBA Architecture is designed to provide an easily extendable basis for parallel computation. It consists of a single element which enables parallel computation and pipe and filter architecture that can be used to chain the computation of multiple single elements in
2 Single LBA element Single LBM element Job generator Developer provided computing utility API Job generator Developer provided computing utility API Scheduling component Component 1 Scheduling component Component Result handler Component N Result handler Component N Figure 1. LBA Architecture complex data processing scenarios. Single LBA element is described in section 2.1 and composition of elements is outlined in section Single LBA element Single LBA element (Fig. 1) contains scheduling component and computation components. Scheduling component is used to schedule jobs and dispatch data to computing components. components are designed to perform application logic to receive input data and to return computation results to the scheduling component. Various statistical data are returned together with the results to the scheduling component in order to support scheduler decision making process. Scheduling component and each of the compute nodes are executed at the separate MPI nodes. In order to make LBA architecture generic and easily extendable two APIs are defined: scheduling API and computing API. The scheduling API is used to provide support for complex job dependencies as well as to enable custom scheduling policies to be used. Complex job dependencies are managed by the developer provided job generator connected to the scheduler using the scheduling API. The scheduler gets a list of available jobs from the job generator and assigns them together with accompanying data to the computing components. components execute the assigned job through the provided computing utility and send the result to the scheduling component. The scheduling component forwards the received result to the provided result handler. The result handler is developer provided and can be connected with the job generator within the same single LBA element in order to furnish new data for computation or to enable new jobs for execution. This enables developer to create and enforce arbitrarily complex job dependencies that are needed to complete high level distributed task Pipes and Filters Single elements can be composed into complex topologies using well-known pipe and filter architecture [3]. Single LBA element is used as a filter and underlying data transfer (pipe) can be implemented by technology available on the given computer cluster. Data flow between two LBA elements is accomplished by connecting result handler in the source LBA element and job generator in the sink LBA element. Startup and data transfer between LBA elements are two main issues in the proposed pipe and filter architecture. In the MPI environment all filters (and the corresponding scheduling and computing components) must be assigned to MPI compute nodes. The component that determines the layout of the entire system should be provided by the developer. Data transfer between LBA components can be done using MPI or other provided protocol. MPI transfer is potentially unsafe because there are threading issues in MPI implementations and it is assumed that job generator, scheduling component and result handler are running in separate execution threads.
3 3. LBA Scheduling Algorithm While implementing LBA element we have provided a simple scheduling algorithm designed to perform well based on the following assumptions: 1. jobs are independent 2. input data is much larger then the output data 3. all the compute nodes are able to execute any of the jobs 4. compute nodes do not have the same processing power (heterogenous environment) Signaling between scheduling component and computing component is done using MPI. This includes information about status of the compute nodes, together with the information on the jobs pending. The transfer of input and output job data is accomplished using operating system provided resources and is configurable. The algorithm for scheduling component is given in section 3.1, while the algorithm for computing component and computing utility are presented in section Scheduling component algorithm Scheduling component (Fig. 2) is designed to run while there are jobs available. After the job is acquired from the job generator, a target compute node (targetnode) must be determined. Before the decision, the scheduling component updates its information with the statistical data that are queued up from the compute nodes. After the update of the info object, the best compute component for the job is picked up (findbestnode). Info object contains all the information about the status of the compute components. Finding the best compute node for the job is completed in two phases. In the first phase computing components not suitable for the job are filtered out. The criteria for elimination of compute nodes include inadequate storage size for input data, number of jobs pending for execution on compute nodes and amount of input data that should be transferred for pending jobs. In the second phase the best node is picked using similar policy. In the case of all compute components filtered out in the first phase, the algorithm continues to gather statistics until at least one compute component becomes eligible for the job. Scheduling component{ while(jobsavailable()){ job=nextjob(); targetnode=none; while(targetnode==none){ info+=getstatistics (compute_nodes); targetnode=findbestnode(job,info); initiatedatastagingin(job,targetnode); initiatedatastagingout(completedjobs); while(existsfinishedstaging()){ (job,target_node)=getstaged(); Notify(job,target_node); while(!alljobscompleted(info)) while(existsfinishedstaging()){ (job,target_node)=getstaged(); Notify(job,target_node); info+=getstatistics (compute_nodes); Figure 2. Scheduling component algorithm When the job is matched with a compute component, data staging is initiated (initiatedatastagingin) for that job. Data staging is a process of preloading job input data to the node that is scheduled to execute the job. Initiation denotes that data should be moved, but transfer of data is performed only when number of concurrent transfers in progress is below given treshold and when the data for the earlier starting jobs are already transferred. All of the data staging is performed asynchronously so the scheduling component can continue its execution. Output data for completed jobs are also retrieved in the background when similar conditions are met (initiatedatastagingout). These are then available for the developer provided result handler. In the end of its cycle the scheduler component checks if there is any finished data staging (existsfinishedstaging). For every finished data staging the appropriate computing component is notified (Notify), and the job execution can take place. When all the jobs are scheduled, the scheduler component waits for all jobs to complete. While waiting, it takes care of all data staging in progress.
4 3.2. component algorithm component (Fig. 3) is designed to check for availability of new jobs and to queue jobs for the computing utility. Multiple computing utilities are allowed to be attached to the computing component and to consume jobs from the same queue. The computing utility gets jobs from the queue, executes these jobs, updates internal statistics and forwards the updated information to the scheduling component. This information is later used by the scheduling component to determine the availability and performance of every computing component. Communication calls implemented in the computing component and computing utility are exclusively locked and forced to be performed only one at a time. Locking in the computing component is performed in the waitforjob procedure that is designed not to block on MPI calls. This is necessary due to threading issues in MPI implementations, but also adds up to the processing requirements for the algorithm. 4. Experimental results In order to create a job pattern that includes heavy computation and high data throughput we decided to use data compression as the main test scenario. As the input for the system a number of files are provided, and the task is to perform extensive data compression on the files. The provided files are suitable for compression and achieve high compression ratios. The size of each file is 1MB. An appropriate job generator component{ while(true){ job=waitforjob(); enqueuetocompute(job,preload_data); Utility() { while(true){ job=dequeuejob(); execute(job); lock(communication) sendstatisticstoschedulingcomponent(); unlock(communication) Processing speed (jobs/second) Number of computing components 100 jobs 200 jobs 300 jobs 400 jobs Linear increase Figure 4. Experimental results for the specified job model is provided. The measurements are performed on 5 workstations that are connected using 100Mbps LAN. Each workstation is a 2800Mhz Pentium 4 machine, equipped with 1GB RAM and a 10krpm SCSI hard drive. The results obtained from 20 experiments are illustrated in Fig. 4. Job sets of different size are varied together with several cluster sizes in order to determine performance of the scheduling system. It can be observed that the increase in size of the job set is followed by the increase in the processing speed of the entire cluster. This is because the initial setup time of the application significantly affects performance results on small job sets. The initial setup time includes the time necessary for all the components in distributed environment to bootstrap together with the data staging time that is spent before the first compression takes place. The performance increase is linear in scenarios with two or three workstations employed in data compression. The addition of fourth and fifth workstation results in performance gain that asymptotes to the value of 9 jobs/second. This is equivalent to 9MB/s (1MB per job) outgoing network traffic on the workstation hosting the scheduling component. Since the network is theoretically limited to 100Mbps (12MB/s) the performance of the distributed system is bounded by the network interface limitation of the scheduling workstation. In case of network upgrade we expect the next bottleneck to arise in hard drive performance on the same workstation, which Figure 3. component algorithm
5 could be accompanied by the limited processing power. A workaround for this problem would be to distribute scheduling component and the job generators to several workstations which would increase outgoing network performance. This would require all data (uncompressed files) to be available for all schedulers, which is a separate optimization goal. Schedulers should be modified in order to synchronize among themselves to prevent duplication of jobs in the system. 5. Conclusion In this paper we have defined extendable architecture that can be used to develop parallel applications. Several interfaces are defined that can be used by developers to provide custom job generators, result handlers and to define custom scheduling policies. Straightforward algorithm for job scheduling that is targeted for high throughput applications is given to demonstrate the architecture implementation. We focused our attention to network throttling by timing data staging process in order to preload data for the jobs before the compute node becomes available. An experiment that includes independent job set with high throughput data transfer to the computing components was completed. The results indicate that the system bottleneck is in the only component in the system that is not distributed, i.e. the scheduling workstation. The next logical step in extending the proposed architecture is to distribute scheduling. We also plan to extend this framework to achieve continuous data flow from the job generator through the compute nodes to the result handler targeting near real-time data processing. 6. References [1] Chavarría-Miranda DG, Mellor-Crummey JM. An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications. Proceedings of the International Conference on Parallel Architectures and Compilation Techniques; 2002 Sep 22-25; Charlottesville, Virginia, USA; p [2] Folding@home Main [3] Garlan D, Shaw M. Software Architecture: Perspectives on an Emerging Discipline. Prentice Hall; [4] Grid Engine [5] Grudenic I, Bogunovic N. Modeling and Verification of MPI Based Distributed Software. LNCS 2006; 4192; [6] Krammer B, Müller MS, Resch MM. Runtime Checking of MPI Applications with MARMOT. In: Joubert GR, Nagel WE, Peters FJ, Plata O, Tirado P, Zapata E, editors. Proceedings of the International Conference on Parallel ; 2005 Sep 13-16; Malaga, Spain. John von Neumann Institute for, Jülich; p [7] Luecke G, Chen H, Coyle J, Hoekstra J, Kraeva M, Zou Y. MPI-CHECK: a tool for checking Fortran 90 MPI programs. Concurrency and Computation: Practice and Experience 2003; 15(2) : [8] Moab Cluster Software Suite ucts/moab-cluster-suite.php [9] Pllana S., Fahringer T. Performance Prophet: A Performance Modeling and Prediction Tool for Parallel and Distributed Programs. Proceedings of the International Conference on Parallel Processing; 2005 Jun; Oslo, Norway. Performance Evaluation of Networks for Parallel, Cluster and Grid Systems; 2005 June 14-17; Oslo, Norway: IEEE Computer Society; p [10] SETI@home: Search for Extraterrestrial Intelligence at Home [11] Siegel FS. Verifying Parallel Programs with MPI-Spin. LNCS 2007; 4757; [12] The Open Source Distributed Render Queue (DrQueue) [04/25/2008] [13] The TotalView Debugger (TVD) [04/25/2008]
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden [email protected],
A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing
A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing N.F. Huysamen and A.E. Krzesinski Department of Mathematical Sciences University of Stellenbosch 7600 Stellenbosch, South
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
Chapter 2: OS Overview
Chapter 2: OS Overview CmSc 335 Operating Systems 1. Operating system objectives and functions Operating systems control and support the usage of computer systems. a. usage users of a computer system:
Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing
www.ijcsi.org 227 Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing Dhuha Basheer Abdullah 1, Zeena Abdulgafar Thanoon 2, 1 Computer Science Department, Mosul University,
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, [email protected] Abstract 1 Interconnect quality
Operating Systems 4 th Class
Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science
159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354
159.735 Final Report Cluster Scheduling Submitted by: Priti Lohani 04244354 1 Table of contents: 159.735... 1 Final Report... 1 Cluster Scheduling... 1 Table of contents:... 2 1. Introduction:... 3 1.1
New Design and Layout Tips For Processing Multiple Tasks
Novel, Highly-Parallel Software for the Online Storage System of the ATLAS Experiment at CERN: Design and Performances Tommaso Colombo a,b Wainer Vandelli b a Università degli Studi di Pavia b CERN IEEE
CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015
CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 1. Goals and Overview 1. In this MP you will design a Dynamic Load Balancer architecture for a Distributed System 2. You will
MOSIX: High performance Linux farm
MOSIX: High performance Linux farm Paolo Mastroserio [[email protected]] Francesco Maria Taurino [[email protected]] Gennaro Tortone [[email protected]] Napoli Index overview on Linux farm farm
ANALYSIS OF GRID COMPUTING AS IT APPLIES TO HIGH VOLUME DOCUMENT PROCESSING AND OCR
ANALYSIS OF GRID COMPUTING AS IT APPLIES TO HIGH VOLUME DOCUMENT PROCESSING AND OCR By: Dmitri Ilkaev, Stephen Pearson Abstract: In this paper we analyze the concept of grid programming as it applies to
Quality of Service on the Internet: Evaluation of the IntServ Architecture on the Linux Operative System 1
Quality of Service on the Internet: Evaluation of the IntServ Architecture on the Linux Operative System 1 Elisabete Reis [email protected] Polytechnic Institute of Coimbra Fernando Melo [email protected]
How To Understand The History Of An Operating System
7 Operating Systems 7.1 Source: Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: 7.2 Understand the role of the operating system.
Load Balancing of Web Server System Using Service Queue Length
Load Balancing of Web Server System Using Service Queue Length Brajendra Kumar 1, Dr. Vineet Richhariya 2 1 M.tech Scholar (CSE) LNCT, Bhopal 2 HOD (CSE), LNCT, Bhopal Abstract- In this paper, we describe
Quantum StorNext. Product Brief: Distributed LAN Client
Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without
Scalability Factors of JMeter In Performance Testing Projects
Scalability Factors of JMeter In Performance Testing Projects Title Scalability Factors for JMeter In Performance Testing Projects Conference STEP-IN Conference Performance Testing 2008, PUNE Author(s)
Software design (Cont.)
Package diagrams Architectural styles Software design (Cont.) Design modelling technique: Package Diagrams Package: A module containing any number of classes Packages can be nested arbitrarily E.g.: Java
A High Performance Computing Scheduling and Resource Management Primer
LLNL-TR-652476 A High Performance Computing Scheduling and Resource Management Primer D. H. Ahn, J. E. Garlick, M. A. Grondona, D. A. Lipari, R. R. Springmeyer March 31, 2014 Disclaimer This document was
Hadoop on a Low-Budget General Purpose HPC Cluster in Academia
Hadoop on a Low-Budget General Purpose HPC Cluster in Academia Paolo Garza, Paolo Margara, Nicolò Nepote, Luigi Grimaudo, and Elio Piccolo Dipartimento di Automatica e Informatica, Politecnico di Torino,
Analysis of Effect of Handoff on Audio Streaming in VOIP Networks
Beyond Limits... Volume: 2 Issue: 1 International Journal Of Advance Innovations, Thoughts & Ideas Analysis of Effect of Handoff on Audio Streaming in VOIP Networks Shivani Koul* [email protected]
Load Balancing in Fault Tolerant Video Server
Load Balancing in Fault Tolerant Video Server # D. N. Sujatha*, Girish K*, Rashmi B*, Venugopal K. R*, L. M. Patnaik** *Department of Computer Science and Engineering University Visvesvaraya College of
An Overview of Distributed Databases
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 2 (2014), pp. 207-214 International Research Publications House http://www. irphouse.com /ijict.htm An Overview
Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide. An Oracle White Paper October 2010
Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide An Oracle White Paper October 2010 Disclaimer The following is intended to outline our general product direction.
Abstract. 1. Introduction
A REVIEW-LOAD BALANCING OF WEB SERVER SYSTEM USING SERVICE QUEUE LENGTH Brajendra Kumar, M.Tech (Scholor) LNCT,Bhopal 1; Dr. Vineet Richhariya, HOD(CSE)LNCT Bhopal 2 Abstract In this paper, we describe
Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking
Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking Burjiz Soorty School of Computing and Mathematical Sciences Auckland University of Technology Auckland, New Zealand
Violin: A Framework for Extensible Block-level Storage
Violin: A Framework for Extensible Block-level Storage Michail Flouris Dept. of Computer Science, University of Toronto, Canada [email protected] Angelos Bilas ICS-FORTH & University of Crete, Greece
Using Fuzzy Logic Control to Provide Intelligent Traffic Management Service for High-Speed Networks ABSTRACT:
Using Fuzzy Logic Control to Provide Intelligent Traffic Management Service for High-Speed Networks ABSTRACT: In view of the fast-growing Internet traffic, this paper propose a distributed traffic management
Parallel Firewalls on General-Purpose Graphics Processing Units
Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering
Big data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
Chapter 6, The Operating System Machine Level
Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General
Parallelization of video compressing with FFmpeg and OpenMP in supercomputing environment
Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 231 237 doi: 10.14794/ICAI.9.2014.1.231 Parallelization of video compressing
A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster
, pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing
Software Engineering
Software Engineering Lecture 06: Design an Overview Peter Thiemann University of Freiburg, Germany SS 2013 Peter Thiemann (Univ. Freiburg) Software Engineering SWT 1 / 35 The Design Phase Programming in
How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)
Scalability Results Select the right hardware configuration for your organization to optimize performance Table of Contents Introduction... 1 Scalability... 2 Definition... 2 CPU and Memory Usage... 2
THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid
THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid Contents 2 The ARCOS Group. Expand motivation. Expand
Attunity RepliWeb Event Driven Jobs
Attunity RepliWeb Event Driven Jobs Software Version 5.2 June 25, 2012 RepliWeb, Inc., 6441 Lyons Road, Coconut Creek, FL 33073 Tel: (954) 946-2274, Fax: (954) 337-6424 E-mail: [email protected], Support:
Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations
Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations Technical Product Management Team Endpoint Security Copyright 2007 All Rights Reserved Revision 6 Introduction This
Optimization of Cluster Web Server Scheduling from Site Access Statistics
Optimization of Cluster Web Server Scheduling from Site Access Statistics Nartpong Ampornaramveth, Surasak Sanguanpong Faculty of Computer Engineering, Kasetsart University, Bangkhen Bangkok, Thailand
How To Virtualize A Storage Area Network (San) With Virtualization
A New Method of SAN Storage Virtualization Table of Contents 1 - ABSTRACT 2 - THE NEED FOR STORAGE VIRTUALIZATION 3 - EXISTING STORAGE VIRTUALIZATION METHODS 4 - A NEW METHOD OF VIRTUALIZATION: Storage
POWER ALL GLOBAL FILE SYSTEM (PGFS)
POWER ALL GLOBAL FILE SYSTEM (PGFS) Defining next generation of global storage grid Power All Networks Ltd. Technical Whitepaper April 2008, version 1.01 Table of Content 1. Introduction.. 3 2. Paradigm
Informatica Ultra Messaging SMX Shared-Memory Transport
White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade
Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
Crystal Reports Server 2008
Revision Date: July 2009 Crystal Reports Server 2008 Sizing Guide Overview Crystal Reports Server system sizing involves the process of determining how many resources are required to support a given workload.
White Paper Design Sharing in BIM. Design Sharing in BIM
Design Sharing in BIM Introduction Next to a great many benefits, Building Information Modeling (BIM) has also introduced new challenges for the design team. BIM models contain significantly more data
Technical White Paper BlackBerry Enterprise Server
Technical White Paper BlackBerry Enterprise Server BlackBerry Enterprise Edition for Microsoft Exchange For GPRS Networks Research In Motion 1999-2001, Research In Motion Limited. All Rights Reserved Table
Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database
WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
Control 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1
Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System
How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda
How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda 1 Outline Build a cost-efficient Swift cluster with expected performance Background & Problem Solution Experiments
be architected pool of servers reliability and
TECHNICAL WHITE PAPER GRIDSCALE DATABASE VIRTUALIZATION SOFTWARE FOR MICROSOFT SQL SERVER Typical enterprise applications are heavily reliant on the availability of data. Standard architectures of enterprise
Ring Local Area Network. Ring LANs
Ring Local Area Network Ring interface (1-bit buffer) Ring interface To station From station Ring LANs The ring is a series of bit repeaters, each connected by a unidirectional transmission link All arriving
Group Based Load Balancing Algorithm in Cloud Computing Virtualization
Group Based Load Balancing Algorithm in Cloud Computing Virtualization Rishi Bhardwaj, 2 Sangeeta Mittal, Student, 2 Assistant Professor, Department of Computer Science, Jaypee Institute of Information
Software and the Concurrency Revolution
Software and the Concurrency Revolution A: The world s fastest supercomputer, with up to 4 processors, 128MB RAM, 942 MFLOPS (peak). 2 Q: What is a 1984 Cray X-MP? (Or a fractional 2005 vintage Xbox )
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010
Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010 This document is provided as-is. Information and views expressed in this document, including URL and other Internet
Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.
Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development
Tableau Server Scalability Explained
Tableau Server Scalability Explained Author: Neelesh Kamkolkar Tableau Software July 2013 p2 Executive Summary In March 2013, we ran scalability tests to understand the scalability of Tableau 8.0. We wanted
ADMINISTRATION AND CONFIGURATION OF HETEROGENEOUS NETWORKS USING AGLETS
ANNALS OF THE FACULTY OF ENGINEERING HUNEDOARA 2006, Tome IV, Fascicole 1, (ISSN 1584 2665) FACULTY OF ENGINEERING HUNEDOARA, 5, REVOLUTIEI, 331128, HUNEDOARA ADMINISTRATION AND CONFIGURATION OF HETEROGENEOUS
Data-Flow Awareness in Parallel Data Processing
Data-Flow Awareness in Parallel Data Processing D. Bednárek, J. Dokulil *, J. Yaghob, F. Zavoral Charles University Prague, Czech Republic * University of Vienna, Austria 6 th International Symposium on
Grid based Integration of Real-Time Value-at-Risk (VaR) Services. Abstract
Grid based Integration of Real-Time Value-at-Risk (VaR) s Paul Donachy Daniel Stødle Terrence J harmer Ron H Perrott Belfast e-science Centre www.qub.ac.uk/escience Brian Conlon Gavan Corr First Derivatives
SIPAC. Signals and Data Identification, Processing, Analysis, and Classification
SIPAC Signals and Data Identification, Processing, Analysis, and Classification Framework for Mass Data Processing with Modules for Data Storage, Production and Configuration SIPAC key features SIPAC is
Virtual Machine Based Resource Allocation For Cloud Computing Environment
Virtual Machine Based Resource Allocation For Cloud Computing Environment D.Udaya Sree M.Tech (CSE) Department Of CSE SVCET,Chittoor. Andra Pradesh, India Dr.J.Janet Head of Department Department of CSE
Path Optimization in Computer Networks
Path Optimization in Computer Networks Roman Ciloci Abstract. The main idea behind path optimization is to find a path that will take the shortest amount of time to transmit data from a host A to a host
Dynamic load balancing of parallel cellular automata
Dynamic load balancing of parallel cellular automata Marc Mazzariol, Benoit A. Gennart, Roger D. Hersch Ecole Polytechnique Fédérale de Lausanne, EPFL * ABSTRACT We are interested in running in parallel
Gaming as a Service. Prof. Victor C.M. Leung. The University of British Columbia, Canada www.ece.ubc.ca/~vleung
Gaming as a Service Prof. Victor C.M. Leung The University of British Columbia, Canada www.ece.ubc.ca/~vleung International Conference on Computing, Networking and Communications 4 February, 2014 Outline
WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE
WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE 1 W W W. F U S I ON I O.COM Table of Contents Table of Contents... 2 Executive Summary... 3 Introduction: In-Memory Meets iomemory... 4 What
Switched Interconnect for System-on-a-Chip Designs
witched Interconnect for ystem-on-a-chip Designs Abstract Daniel iklund and Dake Liu Dept. of Physics and Measurement Technology Linköping University -581 83 Linköping {danwi,dake}@ifm.liu.se ith the increased
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
Deltek Costpoint 7.1.1. Process Execution Modes
Deltek Costpoint 7.1.1 Process Execution Modes October 24, 2014 While Deltek has attempted to verify that the information in this document is accurate and complete, some typographical or technical errors
FPGA area allocation for parallel C applications
1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University
Parallel Scalable Algorithms- Performance Parameters
www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for
Performance Testing IBM MQSeries* Infrastructures
Performance Testing IBM * Infrastructures MQTester TM for LoadRunner from CommerceQuest Inc. 2001 CommerceQuest Inc. All rights reserved. The information contained herein is the proprietary property of
Latency on a Switched Ethernet Network
Application Note 8 Latency on a Switched Ethernet Network Introduction: This document serves to explain the sources of latency on a switched Ethernet network and describe how to calculate cumulative latency
Performance Evaluation of Wired and Wireless Local Area Networks
International Journal of Engineering Research and Development ISSN: 2278-067X, Volume 1, Issue 11 (July 2012), PP.43-48 www.ijerd.com Performance Evaluation of Wired and Wireless Local Area Networks Prof.
Understanding Neo4j Scalability
Understanding Neo4j Scalability David Montag January 2013 Understanding Neo4j Scalability Scalability means different things to different people. Common traits associated include: 1. Redundancy in the
A Survey Study on Monitoring Service for Grid
A Survey Study on Monitoring Service for Grid Erkang You [email protected] ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide
How to Implement Multi-way Active/Active Replication SIMPLY
How to Implement Multi-way Active/Active Replication SIMPLY The easiest way to ensure data is always up to date in a 24x7 environment is to use a single global database. This approach works well if your
GROUPWARE. Ifeoluwa Idowu
GROUPWARE Ifeoluwa Idowu GROUPWARE What is Groupware? Definitions of Groupware Computer-based systems that support groups of people engaged in a common task (or goal) and that provide an interface to a
EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications
ECE6102 Dependable Distribute Systems, Fall2010 EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications Deepal Jayasinghe, Hyojun Kim, Mohammad M. Hossain, Ali Payani
Resource Utilization of Middleware Components in Embedded Systems
Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system
A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS
Mihai Horia Zaharia, Florin Leon, Dan Galea (3) A Simulator for Load Balancing Analysis in Distributed Systems in A. Valachi, D. Galea, A. M. Florea, M. Craus (eds.) - Tehnologii informationale, Editura
The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang
International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang Nanjing Communications
PERFORMANCE MODELS FOR APACHE ACCUMULO:
Securely explore your data PERFORMANCE MODELS FOR APACHE ACCUMULO: THE HEAVY TAIL OF A SHAREDNOTHING ARCHITECTURE Chris McCubbin Director of Data Science Sqrrl Data, Inc. I M NOT ADAM FUCHS But perhaps
Performance Modeling and Analysis of a Database Server with Write-Heavy Workload
Performance Modeling and Analysis of a Database Server with Write-Heavy Workload Manfred Dellkrantz, Maria Kihl 2, and Anders Robertsson Department of Automatic Control, Lund University 2 Department of
