Relating Empirical Performance Data to Achievable Parallel Application Performance

Size: px
Start display at page:

Download "Relating Empirical Performance Data to Achievable Parallel Application Performance"

Transcription

1 Published in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'99), Vol. III, Las Vegas, Nev., USA, June 28-July 1, 1999, pp Relating Empirical Performance Data to Achievable Parallel Application Performance Roy W. Melton, Cecil O. Alford, Philip R. Bingham, and Tsai Chi Huang Computer Engineering Research Laboratory, School of Electrical and Computer Engineering, Georgia Institute of Technology 777 Atlantic Dr. Ste. 496, Atlanta, GA Abstract. Parallel computing offers the best execution performance for many large, computeintensive applications. Whereas the overall computing requirements of such applications lead to a clear choice of a parallel computing paradigm, the specific suitability of one particular parallel computer versus another for a given application is often less clear. Research is underway to approach this issue of computational suitability by seeking to predict the parallel performance of an application based on the underlying computational metrics of that application and of candidate architectures. This paper presents two case studies where empirical performance data have been used to predict parallel application performance. Keywords: MIMD; Parallel Applications; Performance Analysis, Performance Prediction 1 Parallel Applications and Performance Parallel computing has emerged as a means toward achieving the fastest execution of many computationally challenging applications. Reports of various applications ported to various parallel architectures account for much of scientific computing literature and communications published over the last decade. Such metrics as parallel efficiency and speedup are the ubiquitous hallmarks of this body of work summarizing specific case studies and implementations. While such analysis confirms the rewards of particular parallelization efforts, it often does not explicitly provide insight on the performance of significantly differing parallelization efforts. This absence of performance insight serves to handicap the development and deployment of applications to their full parallel potential. For the most computationally demanding applications, obtaining the fastest execution time is the ultimate goal. Its realization for a particular application implies a change of focus from specific implementations to generalized parallel performance characteristics for that application. Such performance research will then provide a path to optimal performance of a given application on its intended parallel computing platform. The foundations for this parallel performance research lie in work addressing the parallel scalability of various algorithms and applications. Typically, such work is based on an algorithmic analysis in terms of the order of operations required. This analysis is then applied to either existing or theoretical future parallel computing architectures to project upper and lower bounds on parallel efficiency and/or speedup. Current published parallel performance research reflects this body of work. Moving beyond this status quo of generalized asymptotic estimates of performance scalability toward more realistic estimates of realizable performance requires a change in the underlying analysis. Current algorithmic analysis expressed in order of operations allows asymptotic parallel performance estimates for general classes of parallel computers. More

2 Relating Empirical Performance Data to Achievable Parallel Application Performance 2 accurate computational metrics of algorithms and their aggregate applications will permit better estimates of parallel performance on actual hardware. Two case studies follow that investigate the use of measured operational performance on actual hardware to predict parallel application performance. 2 Image Processing Application A parallel performance estimation technique has been applied to an image processing application in order to establish computational requirements necessary to support the application. The Computer Engineering Research Laboratory at Georgia Tech designed and implemented a set of parallel processors for a real-time object detection application as part of a research contract. During the term of the contract, it was known that no commercially available processors could implement the application; however in anticipation of projected increases in the computing power of future commercially available processors, there was interest in quantifying the computational requirements of the application. Consisting of six custom processors, the Georgia Tech VLSI signal processor chip set (GT-VSP) [1] performs parallel operations to identify desired objects within a real-time image stream. Each of the six types of GT-VSP elements implements a different image processing task: non-uniformity compensation (NUC); temporal filtering (TF); spatial filtering (SF); thresholding (THR); clustering (CLS); and centroiding (CTR). The heterogeneous processors operate in a pipeline fashion (as shown in Figure 1 to process image frames of 128x128 pixels in parallel at up to 100 frames per second (fps) [2]. Although the custom GT-VSP intrinsically supports the application, its processing capabilities in terms of standard computational metrics (e.g., millions of instructions per second [MIPS], millions of floating point operations per second [MFLOPS], and millions of operations per second [MOPS]) suitable for comparison to commercially available processors are not readily apparent. Begin_Frame, End_Frame Begin_Row, End_Row NUC Memory 5 x 256 x 256 x 16 TF Memory 4 x 256 x 256 x 16 Centroiding GT-VCTR (X) Non-Uniformity Compensation GT-VNUC Pixel_Int[15:0] Therefore to derive GT-VSP computational requirements, the GT-VSP algorithms were implemented in software [3] and run on a wellbenchmarked machine. Scaling the empirical performance data by this machine s published performance figures yielded standardized performance estimates for the GT-VSP system implementation. 2.1 Empirical Performance Data A Sun SPARCstation 2 (SS2) in single user mode executed the software versions of GT- VSP repeatedly to generate corresponding execution timings. Each GT-VSP algorithm was benchmarked both in direct form from [3] as C code and in one or more computationally optimized C versions. Figure 2 shows the benchmarks for THR in simple mode. The algorithmic version with the best timing was then used for the standardized performance estimation. Two main issues drove the performance measurement methodology. First for the methodology to be successful, the software implementation needed to reflect the actual hardware operation as fully as is possible. Individually, hardware and software 16 Temporal Filtering GT-VTF Spatial Filtering GT-VTF Thresholding GT-VTF Clustering GT-VCLS Centroiding GT-VCTR (Y) Figure 1. GT-VSP Processing Pipeline

3 Relating Empirical Performance Data to Achievable Parallel Application Performance 3 void SimpleThreshold (pixel_type Lower, pixel_type Uppper, long Count, int Row, Column; pixel_type Pixel; pixel_type In [rows][columns], pixel_type Out [rows][columns] ) { void SimpleThreshold (pixel_type Lower, pixel_type Uppper, long Count, int I, J; pixel_type InPtr, OutPtr, Pixel; pixel_type In [rows][columns], pixel_type Out [rows][columns] ) { Count = 0; for (Row = 0; Row < rows; Row++) { for (Column = 0; Column < columns; Column++) { Pixel = In [Row][Column] if ((Pixel >= Lower) && (Pixel <= Upper) ) { (*Count)++; *Out [Row][Column] = Pixel; else { *Out [Row][Column] = (pixel_type) 0; Figure 2. THR Simple Algorithm: Direct (left) and Optimized (right) C Versions typically impart differing optimizations to a given algorithmic implementation. Therefore, implementation evaluation between these paradigms requires compensation for their intrinsic optimization techniques. An optimized software implementation can avoid some operations inherent in a hardware solution. Software operating on general-purpose processors affords some performance optimizations that do not reflect the custom GT- VSP s operation. Whereas compilers and/or processor hardware facilitate the minimizing of operations based on test conditions and image data (e.g., avoiding multiplication by zero or one), GT-VSP must execute for any pixel the worst case computation as though it occurred for every pixel, because GT-VSP ensures 100 fps processing. The functions within the algorithms and the implementation constraints impact the hardware in ways that are not reflected in software. Thus, GT-VSP performance is best characterized by benchmarking the software with the worst case image data (i.e., the highest number of images to detect, non-zero/non-unity filter coefficients, etc.); this scenario corresponds to a real processing situation, and any replacement computing hardware must be able to solve it in the required time constraint. For most of the GT-VSP algorithms, worst case image data corresponds to a single image to be processed; however, benchmarking CLS required evaluating an ensemble of images. Unlike the other algorithms, CLS is affected not only by how many object (i.e., non-zero) pixels Count = 0; for (I = 0, InPtr = In [0], OutPtr = Out [0]; I < rows; I++) { for (J = 0; J < columns J++, InPtr++, OutPtr++) { Pixel = InPtr; if ((Pixel >= Lower) && (Pixel <= Upper) ) { (*Count)++; *OutPtr = Pixel; else { *OutPtr = (pixel_type) 0; are in a frame but also by how they are arranged in the frame, since it identifies all contiguous non-zero pixels as belonging to a unique object. As pixels are evaluated in raster-scan order, sometimes what appears to be a unique object turns out to be part of a larger object, and thus the object data must be merged. Thus, characterizing CLS performance required various images that could quantify its object detection and object merging capabilities. On the other hand, an optimized hardware implementation can avoid some data access overhead inherent in a software abstraction. In this manner, GT-VSP does not have to maintain the pixel array indices within an image as the software abstractions do, so in the software implementations, array indices were replaced with direct pointers to pixels everywhere practical for performance optimization. Secondly, software benchmarks of custom hardware needed to handle varying data representations. GT-VSP operates in fixed point arithmetic using precision of 16 to 35 bits as needed to preserve accuracy. Therefore, both 32- bit integer and 32-bit floating point software versions were evaluated. 2.2 Parallel Performance Estimation The Sun SPARCstation 2 (SS2) is a machine with many existing benchmarks. Published performance figures for this computer are 28.5 MIPS and 4.2 MFLOPS. The time required to process a single image frame with each GT-VSP algorithm on this computer was

4 Relating Empirical Performance Data to Achievable Parallel Application Performance 4 Table 1. GT-VSP Performance Estimate in MIPS from SS2 Benchmark Floating Point Operations GT-VSP Algorithm MFR (fps) 100/MFR (1/[100 fps]) MIPS for 100 fps NUC SF TF THR CLS/CTR Total Integer Operations GT-VSP Algorithm MFR (fps) 100/MFR (1/[100 fps]) MIPS for 100 fps NUC SF TF THR CLS/CTR Total measured. Then, the SS2 s maximum frame rate (MFR) for each algorithm was calculated by taking the reciprocal of the observed frame time. Table 1 shows the MFRs along with the estimated GT-VSP MIPS requirement. The MIPS requirement was determined by scaling the SS2 MFR by a factor of 100/MFR (included in Table 1) to obtain the 100 fps processing performance of GT-VSP. Table 2 shows the operation count for each GT-VSP algorithm in MOPS along with the computed SS2 MFLOPS requirement for 100 fps and the ratio of MFLOPS to MOPS. The SS2 MFLOPS requirement was determined by scaling the MIPS requirement from Table 1 by a factor of 4.2/28.5, the ratio of SS2 MFLOPS to MOPS. The fact that the MOPS and MIPS numbers differ illustrates that the GT-VSP and software implementations handle calculations differently. The higher ratios for the essentially algebraic NUC, SF, TF, and THR algorithms indicate additional overhead in the software version, whereas the GT-VSP version of the CLS/CTR function exhibits overhead not present in the software version. 3 Global Climate Modeling Application In addition to image processing, parallel performance estimation based on empirical computational measurements has been applied to facilitate the efficient parallelization of a global climate change model. Execution profiles of the original serial version of the model were used to determine that one algorithmic component was responsible for most of the model s serial execution time. Subsequently, a parallel version of this component was evaluated. The measured performance results were then used to predict the performance of the parallel model. After the model was parallelized, its performance was measured and compared to the predictions. Table 2. GT-VSP Operation Count and Estimate in MFLOPS from SS2 Benchmark GT-VSP Algorithm MOPS at 100 fps MFLOPS for 100 fps Ratio MFLOPS/MOPS NUC SF TF THR CLS/CTR Total

5 Relating Empirical Performance Data to Achievable Parallel Application Performance 5 The Enhanced Dynamical/Chemical Model of Atmospheric Ozone [4], a global climate change model used and refined over more than two decades [5], solves a set of timedependent partial differential equations in three spatial dimensions (illustrated in Figure 3). It uses a finite difference method in time and in the vertical dimension whose coordinate is pressure discretized into 32 atmospheric levels representing from the earth s surface upward 85 km. The horizontal dimensions are solved using Transputer Array a spectral method truncated triangularly at wave number 19 (T19) which corresponds to a 64x28 longitude-latitude grid. During each time step, data are transformed between physical grid space where model physics are computed and spectral space where model dynamics are computed. The first parallel version of the model distributed data and computations across 32 Inmos T800 Transputers by atmospheric level (one processor per level) [6]. This vertical parallelization executed more efficiently than the Latitude Pair Level EARTH 32 Atmospheric Levels 64 Longitudes 28 Latitudes Figure 3. Global Climate Modeling Application

6 Relating Empirical Performance Data to Achievable Parallel Application Performance 6 serial version. Further model parallelization required splitting the compute-intensive spectral transform, so preliminary computational analysis was conducted in hopes of achieving a more efficient result. 3.1 Empirical Performance Data Execution profiles of the original serial code revealed that over 70% of execution time is spent in the spectral transform code. The transform consists of two phases: a Guassian quadrature approximation to a Legendre transform for each model grid longitude (spectral wave number); and a Fourier transform phase for each model grid latitude (spectral wave index). The performance of the transform parallelized across latitudes was evaluated since this distribution keeps the known computationally efficient FFTs local to a single processor. Fully parallelized across 14 latitude processors, (28 model latitudes in north/south pairs), the spectral transform exhibited a speedup of This measured transform performance is in line with results published for other spectral models [7, 8]. Following this preliminary performance analysis of the transform, the model was parallelized across the latitude dimension to produce a scalable processor mapping shown in Figure 3. The correctness of the parallel model was verified by comparing its output to that of the original serial model. Then, the performance of several processor configurations was measured in terms of the average time to complete a model time step. For these measurements, no model data was output to disk; the unoptimized parallel I/O would have introduced significant serial overhead not reflected by the spectral transform benchmark. 3.2 Parallel Performance Estimation Applying Amdahl s law expressed as 1 t N = f S + f P ts, N where N is the number of processors, t N is the execution time on N processors, f S is the serial fraction, f P is the parallel fraction, and t S is the serial exectuion time, the serial fraction of the spectral transform is Most overhead incurred in the latitude parallelization impacts only the spectral transform. Therefore based on the execution profile of the original serial code (see section 3.1), the serial fraction of the overall model is roughly 0.053, 70% of the spectral transform s From the measured model execution time and the derived serial fraction, parallel performance estimates were computed for various degrees of latitude parallelization as show in Table 3. Table 3 also gives the actual performance measurements, the percentage error of the estimates, and the distribution profile of latitudes to processors. The parallel performance estimates are within 5% of the actual realized performance except for the last case. Whereas the latitude distributions from 2 up to 5 processors resulted in a strictly decreasing maximum number of latitudes per processor, the last distribution to 6 processors did not decrease the maximum number of latitudes per processor. To evaluate the full scalable potential of the model, additional Transputers would need to be configured. Only two meaningful distributions (i.e., those which reduce the maximum number of latitudes per processor with a minimum number of processors) remain to be measured: 7 latitude processors (224 total processors); and, 14 latitude processors (448 total Table 3. Parallel GCCM Performance by Latitutde Total Processors Latitude Processors, N Latitude Distribution Actual Time, T N (s) Estimated Time (s) Error (%) N/A N/A , , 4, , 4, 3, , 3, 3, 2, , 2, 2, 2, 3,

7 Relating Empirical Performance Data to Achievable Parallel Application Performance 7 processors). Currently, only 192 processors are configured for the GCCM. 4 Conclusions The use of measured operational performance on actual hardware to predict parallel performance has been presented for two applications: image processing and global climate modeling. In the first case, performance data were used to characterize computational requirements to match custom parallel hardware. In the second case, performance data on a major algorithmic component were used to predict the performance resulting from a given parallel performance effort. While these two examples illustrate that parallel performance can be predicted from empirical results, their results suggest that care must be taken when comparing serial and parallel implementations (e.g., that the measured serial program reflects the computational methodology of the parallel application). The objective of using empirical computational metrics rather than traditional algorithmic order of operations analysis is to produce more accurate parallel application performance predictions rather than loose performance bounds. For many applications, achieving the fastest solution in today s implementation is more important than knowing the theoretical ideal implementation. Real performance data from target processors can elucidate how to obtain the fastest available execution of an application, given that necessary parameters are measured and accounted. Further research is necessary to determine the necessary set of parameters and what equations are necessary to translate them into performance estimates. 5 References [1] W. S. Tan et al., A High-Performance Modular Signal Processor for Object Detection, Proceedings of the 1990 Government Microcircuit Applications Conference (GOMAC), Las Vegas, Nev., USA, November 4-8, 1990, pp [2] R. W. Melton et al., A VLSI System Implementation for Real-Time Object Detection, 1996 IEEE International Symposium on Circuits and Systems (ISCAS'96), Vol. 4, Atlanta, Ga., USA, May 12-15, 1996, pp [3] A. M. Henshaw et al., Signal Processing Algorithms-Georgia Tech Benchmark, Special Technical Report, Report No. STR , Computer Engineering Research Laboratory, School of Electrical Engineering, Georgia Institute of Technology, February 27, [4] F. N. Alyea et al., An enhanced Dynamical/Chemical Model of Atmospheric Ozone, School of Geophysical Sciences, Georgia Institute of Technology, Atlanta, Ga., USA, July, [5] D. Cunnold et al., A Three-Dimensional Dynamical-Chemical Model of Atmospheric Ozone, Journal of the Atmospheric Sciences, Vol. 32, Jan., 1975, pp [6] R. W. Melton et al., A Transputer-Based Scalable, Parallel Global Climate Change Model, Transputer Research and Applications Conference 7 (NATUG 7), Athens, Ga., USA, October 23-25, 1994, pp [7] G. Carver, A Spectral Meteorological Model on the ICL DAP, Parallel Computing, Vol. 8, No. 1-3, Oct., 1988, pp [8] D. F. Snelling, A High Resolution Parallel Legendre Transform Algorithm, Supercomputing (Proceedings of the First International Conference), 1988, pp

Parallel Scalable Algorithms- Performance Parameters

Parallel Scalable Algorithms- Performance Parameters www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems 202 IEEE 202 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum The Green Index: A Metric

More information

THE NAS KERNEL BENCHMARK PROGRAM

THE NAS KERNEL BENCHMARK PROGRAM THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures

More information

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Quiz for Chapter 1 Computer Abstractions and Technology 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

More information

Systolic Computing. Fundamentals

Systolic Computing. Fundamentals Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW

More information

Performance metrics for parallel systems

Performance metrics for parallel systems Performance metrics for parallel systems S.S. Kadam C-DAC, Pune sskadam@cdac.in C-DAC/SECG/2006 1 Purpose To determine best parallel algorithm Evaluate hardware platforms Examine the benefits from parallelism

More information

Design of Remote data acquisition system based on Internet of Things

Design of Remote data acquisition system based on Internet of Things , pp.32-36 http://dx.doi.org/10.14257/astl.214.79.07 Design of Remote data acquisition system based on Internet of Things NIU Ling Zhou Kou Normal University, Zhoukou 466001,China; Niuling@zknu.edu.cn

More information

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics 22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC

More information

Solution of Linear Systems

Solution of Linear Systems Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Synchronization of sampling in distributed signal processing systems

Synchronization of sampling in distributed signal processing systems Synchronization of sampling in distributed signal processing systems Károly Molnár, László Sujbert, Gábor Péceli Department of Measurement and Information Systems, Budapest University of Technology and

More information

CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY

CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY White Paper CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY DVTel Latitude NVMS performance using EMC Isilon storage arrays Correct sizing for storage in a DVTel Latitude physical security

More information

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed

More information

System Models for Distributed and Cloud Computing

System Models for Distributed and Cloud Computing System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems

More information

A Systolic Algorithm to Process Compressed Binary Images

A Systolic Algorithm to Process Compressed Binary Images A Systolic Algorithm to Process Compressed Binary Images Fikret Ercal, Mark Allen, and Hao Feng University of Missouri Rolla Department of Computer Science and Intelligent Systems Center Rolla, MO 65401

More information

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department

More information

Load Balancing on a Grid Using Data Characteristics

Load Balancing on a Grid Using Data Characteristics Load Balancing on a Grid Using Data Characteristics Jonathan White and Dale R. Thompson Computer Science and Computer Engineering Department University of Arkansas Fayetteville, AR 72701, USA {jlw09, drt}@uark.edu

More information

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1} An Efficient RNS to Binary Converter Using the oduli Set {n + 1, n, n 1} Kazeem Alagbe Gbolagade 1,, ember, IEEE and Sorin Dan Cotofana 1, Senior ember IEEE, 1. Computer Engineering Laboratory, Delft University

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden smakadir@csc.kth.se,

More information

on an system with an infinite number of processors. Calculate the speedup of

on an system with an infinite number of processors. Calculate the speedup of 1. Amdahl s law Three enhancements with the following speedups are proposed for a new architecture: Speedup1 = 30 Speedup2 = 20 Speedup3 = 10 Only one enhancement is usable at a time. a) If enhancements

More information

Parallel Analysis and Visualization on Cray Compute Node Linux

Parallel Analysis and Visualization on Cray Compute Node Linux Parallel Analysis and Visualization on Cray Compute Node Linux David Pugmire, Oak Ridge National Laboratory and Hank Childs, Lawrence Livermore National Laboratory and Sean Ahern, Oak Ridge National Laboratory

More information

Some Computer Organizations and Their Effectiveness. Michael J Flynn. IEEE Transactions on Computers. Vol. c-21, No.

Some Computer Organizations and Their Effectiveness. Michael J Flynn. IEEE Transactions on Computers. Vol. c-21, No. Some Computer Organizations and Their Effectiveness Michael J Flynn IEEE Transactions on Computers. Vol. c-21, No.9, September 1972 Introduction Attempts to codify a computer have been from three points

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

IS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration

IS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration IS-ENES/PrACE Meeting EC-EARTH 3 A High-resolution Configuration Motivation Generate a high-resolution configuration of EC-EARTH to Prepare studies of high-resolution ESM in climate mode Prove and improve

More information

Cellular Computing on a Linux Cluster

Cellular Computing on a Linux Cluster Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results

More information

A Pattern-Based Approach to. Automated Application Performance Analysis

A Pattern-Based Approach to. Automated Application Performance Analysis A Pattern-Based Approach to Automated Application Performance Analysis Nikhil Bhatia, Shirley Moore, Felix Wolf, and Jack Dongarra Innovative Computing Laboratory University of Tennessee (bhatia, shirley,

More information

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case. Algorithms Efficiency of algorithms Computational resources: time and space Best, worst and average case performance How to compare algorithms: machine-independent measure of efficiency Growth rate Complexity

More information

BUSINESS RULES AND GAP ANALYSIS

BUSINESS RULES AND GAP ANALYSIS Leading the Evolution WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Discovery and management of business rules avoids business disruptions WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Business Situation More

More information

Operation Count; Numerical Linear Algebra

Operation Count; Numerical Linear Algebra 10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

CHAPTER 3: DIGITAL IMAGING IN DIAGNOSTIC RADIOLOGY. 3.1 Basic Concepts of Digital Imaging

CHAPTER 3: DIGITAL IMAGING IN DIAGNOSTIC RADIOLOGY. 3.1 Basic Concepts of Digital Imaging Physics of Medical X-Ray Imaging (1) Chapter 3 CHAPTER 3: DIGITAL IMAGING IN DIAGNOSTIC RADIOLOGY 3.1 Basic Concepts of Digital Imaging Unlike conventional radiography that generates images on film through

More information

Project Management Process

Project Management Process Project Management Process Description... 1 STAGE/STEP/TASK SUMMARY LIST... 2 Project Initiation 2 Project Control 4 Project Closure 5 Project Initiation... 7 Step 01: Project Kick Off 10 Step 02: Project

More information

Determining Total Cost of Ownership for Data Center and Network Room Infrastructure

Determining Total Cost of Ownership for Data Center and Network Room Infrastructure Determining Total Cost of Ownership for Data Center and Network Room Infrastructure White Paper #6 Revision 3 Executive Summary An improved method for measuring Total Cost of Ownership of data center and

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

A Prediction-Based Transcoding System for Video Conference in Cloud Computing

A Prediction-Based Transcoding System for Video Conference in Cloud Computing A Prediction-Based Transcoding System for Video Conference in Cloud Computing Yongquan Chen 1 Abstract. We design a transcoding system that can provide dynamic transcoding services for various types of

More information

Grid Computing Vs. Cloud Computing

Grid Computing Vs. Cloud Computing International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 577-582 International Research Publications House http://www. irphouse.com /ijict.htm Grid

More information

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis Performance Metrics and Scalability Analysis 1 Performance Metrics and Scalability Analysis Lecture Outline Following Topics will be discussed Requirements in performance and cost Performance metrics Work

More information

Capacity Planning Process Estimating the load Initial configuration

Capacity Planning Process Estimating the load Initial configuration Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting

More information

Motivation: Smartphone Market

Motivation: Smartphone Market Motivation: Smartphone Market Smartphone Systems External Display Device Display Smartphone Systems Smartphone-like system Main Camera Front-facing Camera Central Processing Unit Device Display Graphics

More information

High-speed image processing algorithms using MMX hardware

High-speed image processing algorithms using MMX hardware High-speed image processing algorithms using MMX hardware J. W. V. Miller and J. Wood The University of Michigan-Dearborn ABSTRACT Low-cost PC-based machine vision systems have become more common due to

More information

Studying Code Development for High Performance Computing: The HPCS Program

Studying Code Development for High Performance Computing: The HPCS Program Studying Code Development for High Performance Computing: The HPCS Program Jeff Carver 1, Sima Asgari 1, Victor Basili 1,2, Lorin Hochstein 1, Jeffrey K. Hollingsworth 1, Forrest Shull 2, Marv Zelkowitz

More information

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern: Pipelining HW Q. Can a MIPS SW instruction executing in a simple 5-stage pipelined implementation have a data dependency hazard of any type resulting in a nop bubble? If so, show an example; if not, prove

More information

RN-Codings: New Insights and Some Applications

RN-Codings: New Insights and Some Applications RN-Codings: New Insights and Some Applications Abstract During any composite computation there is a constant need for rounding intermediate results before they can participate in further processing. Recently

More information

A General Framework for Tracking Objects in a Multi-Camera Environment

A General Framework for Tracking Objects in a Multi-Camera Environment A General Framework for Tracking Objects in a Multi-Camera Environment Karlene Nguyen, Gavin Yeung, Soheil Ghiasi, Majid Sarrafzadeh {karlene, gavin, soheil, majid}@cs.ucla.edu Abstract We present a framework

More information

4.3. David E. Rudack*, Meteorological Development Laboratory Office of Science and Technology National Weather Service, NOAA 1.

4.3. David E. Rudack*, Meteorological Development Laboratory Office of Science and Technology National Weather Service, NOAA 1. 43 RESULTS OF SENSITIVITY TESTING OF MOS WIND SPEED AND DIRECTION GUIDANCE USING VARIOUS SAMPLE SIZES FROM THE GLOBAL ENSEMBLE FORECAST SYSTEM (GEFS) RE- FORECASTS David E Rudack*, Meteorological Development

More information

FPGA area allocation for parallel C applications

FPGA area allocation for parallel C applications 1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University

More information

Selection of Techniques and Metrics

Selection of Techniques and Metrics Performance Evaluation: Selection of Techniques and Metrics Hongwei Zhang http://www.cs.wayne.edu/~hzhang Acknowledgement: this lecture is partially based on the slides of Dr. Raj Jain. Outline Selecting

More information

Computer Architecture TDTS10

Computer Architecture TDTS10 why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers

More information

Proposal and Development of a Reconfigurable Parallel Job Scheduling Algorithm

Proposal and Development of a Reconfigurable Parallel Job Scheduling Algorithm Proposal and Development of a Reconfigurable Parallel Job Scheduling Algorithm Luís Fabrício Wanderley Góes, Carlos Augusto Paiva da Silva Martins Graduate Program in Electrical Engineering PUC Minas {lfwgoes,capsm}@pucminas.br

More information

18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two age 1 18-742 Lecture 4 arallel rogramming II Spring 2005 rof. Babak Falsafi http://www.ece.cmu.edu/~ece742 write X Memory send X Memory read X Memory Slides developed in part by rofs. Adve, Falsafi, Hill,

More information

Technical White Paper. Symantec Backup Exec 10d System Sizing. Best Practices For Optimizing Performance of the Continuous Protection Server

Technical White Paper. Symantec Backup Exec 10d System Sizing. Best Practices For Optimizing Performance of the Continuous Protection Server Symantec Backup Exec 10d System Sizing Best Practices For Optimizing Performance of the Continuous Protection Server Table of Contents Table of Contents...2 Executive Summary...3 System Sizing and Performance

More information

Building Scalable Applications Using Microsoft Technologies

Building Scalable Applications Using Microsoft Technologies Building Scalable Applications Using Microsoft Technologies Padma Krishnan Senior Manager Introduction CIOs lay great emphasis on application scalability and performance and rightly so. As business grows,

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

Building an Inexpensive Parallel Computer

Building an Inexpensive Parallel Computer Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University

More information

RN-coding of Numbers: New Insights and Some Applications

RN-coding of Numbers: New Insights and Some Applications RN-coding of Numbers: New Insights and Some Applications Peter Kornerup Dept. of Mathematics and Computer Science SDU, Odense, Denmark & Jean-Michel Muller LIP/Arénaire (CRNS-ENS Lyon-INRIA-UCBL) Lyon,

More information

PIXEL-LEVEL IMAGE FUSION USING BROVEY TRANSFORME AND WAVELET TRANSFORM

PIXEL-LEVEL IMAGE FUSION USING BROVEY TRANSFORME AND WAVELET TRANSFORM PIXEL-LEVEL IMAGE FUSION USING BROVEY TRANSFORME AND WAVELET TRANSFORM Rohan Ashok Mandhare 1, Pragati Upadhyay 2,Sudha Gupta 3 ME Student, K.J.SOMIYA College of Engineering, Vidyavihar, Mumbai, Maharashtra,

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

White Paper. Recording Server Virtualization

White Paper. Recording Server Virtualization White Paper Recording Server Virtualization Prepared by: Mike Sherwood, Senior Solutions Engineer Milestone Systems 23 March 2011 Table of Contents Introduction... 3 Target audience and white paper purpose...

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Reliable Systolic Computing through Redundancy

Reliable Systolic Computing through Redundancy Reliable Systolic Computing through Redundancy Kunio Okuda 1, Siang Wun Song 1, and Marcos Tatsuo Yamamoto 1 Universidade de São Paulo, Brazil, {kunio,song,mty}@ime.usp.br, http://www.ime.usp.br/ song/

More information

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software

More information

Performance metrics for parallelism

Performance metrics for parallelism Performance metrics for parallelism 8th of November, 2013 Sources Rob H. Bisseling; Parallel Scientific Computing, Oxford Press. Grama, Gupta, Karypis, Kumar; Parallel Computing, Addison Wesley. Definition

More information

A Comparison of General Approaches to Multiprocessor Scheduling

A Comparison of General Approaches to Multiprocessor Scheduling A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University

More information

A Review of Customized Dynamic Load Balancing for a Network of Workstations

A Review of Customized Dynamic Load Balancing for a Network of Workstations A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester

More information

Facts about Visualization Pipelines, applicable to VisIt and ParaView

Facts about Visualization Pipelines, applicable to VisIt and ParaView Facts about Visualization Pipelines, applicable to VisIt and ParaView March 2013 Jean M. Favre, CSCS Agenda Visualization pipelines Motivation by examples VTK Data Streaming Visualization Pipelines: Introduction

More information

Performance Workload Design

Performance Workload Design Performance Workload Design The goal of this paper is to show the basic principles involved in designing a workload for performance and scalability testing. We will understand how to achieve these principles

More information

On some Potential Research Contributions to the Multi-Core Enterprise

On some Potential Research Contributions to the Multi-Core Enterprise On some Potential Research Contributions to the Multi-Core Enterprise Oded Maler CNRS - VERIMAG Grenoble, France February 2009 Background This presentation is based on observations made in the Athole project

More information

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING Hussain Al-Asaad and Alireza Sarvi Department of Electrical & Computer Engineering University of California Davis, CA, U.S.A.

More information

A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3

A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3 A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti, Nidhi Rajak 1 Department of Computer Science & Applications, Dr.H.S.Gour Central University, Sagar, India, ranjit.jnu@gmail.com

More information

Monday January 19th 2015 Title: "Transmathematics - a survey of recent results on division by zero" Facilitator: TheNumberNullity / James Anderson, UK

Monday January 19th 2015 Title: Transmathematics - a survey of recent results on division by zero Facilitator: TheNumberNullity / James Anderson, UK Monday January 19th 2015 Title: "Transmathematics - a survey of recent results on division by zero" Facilitator: TheNumberNullity / James Anderson, UK It has been my pleasure to give two presentations

More information

Benchmarking Large Scale Cloud Computing in Asia Pacific

Benchmarking Large Scale Cloud Computing in Asia Pacific 2013 19th IEEE International Conference on Parallel and Distributed Systems ing Large Scale Cloud Computing in Asia Pacific Amalina Mohamad Sabri 1, Suresh Reuben Balakrishnan 1, Sun Veer Moolye 1, Chung

More information

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Intel Cloud Builders Guide Intel Xeon Processor-based Servers RES Virtual Desktop Extender Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Client Aware Cloud with RES Virtual

More information

SUPER RESOLUTION FROM MULTIPLE LOW RESOLUTION IMAGES

SUPER RESOLUTION FROM MULTIPLE LOW RESOLUTION IMAGES SUPER RESOLUTION FROM MULTIPLE LOW RESOLUTION IMAGES ABSTRACT Florin Manaila 1 Costin-Anton Boiangiu 2 Ion Bucur 3 Although the technology of optical instruments is constantly advancing, the capture of

More information

Floating Point Fused Add-Subtract and Fused Dot-Product Units

Floating Point Fused Add-Subtract and Fused Dot-Product Units Floating Point Fused Add-Subtract and Fused Dot-Product Units S. Kishor [1], S. P. Prakash [2] PG Scholar (VLSI DESIGN), Department of ECE Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu,

More information

Junghyun Ahn Changho Sung Tag Gon Kim. Korea Advanced Institute of Science and Technology (KAIST) 373-1 Kuseong-dong, Yuseong-gu Daejoen, Korea

Junghyun Ahn Changho Sung Tag Gon Kim. Korea Advanced Institute of Science and Technology (KAIST) 373-1 Kuseong-dong, Yuseong-gu Daejoen, Korea Proceedings of the 211 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds. A BINARY PARTITION-BASED MATCHING ALGORITHM FOR DATA DISTRIBUTION MANAGEMENT Junghyun

More information

telemetry Rene A.J. Chave, David D. Lemon, Jan Buermans ASL Environmental Sciences Inc. Victoria BC Canada rchave@aslenv.com I.

telemetry Rene A.J. Chave, David D. Lemon, Jan Buermans ASL Environmental Sciences Inc. Victoria BC Canada rchave@aslenv.com I. Near real-time transmission of reduced data from a moored multi-frequency sonar by low bandwidth telemetry Rene A.J. Chave, David D. Lemon, Jan Buermans ASL Environmental Sciences Inc. Victoria BC Canada

More information

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers This Unit: Floating Point Arithmetic CIS 371 Computer Organization and Design Unit 7: Floating Point App App App System software Mem CPU I/O Formats Precision and range IEEE 754 standard Operations Addition

More information

EE361: Digital Computer Organization Course Syllabus

EE361: Digital Computer Organization Course Syllabus EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)

More information

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, Josef.Pelikan@mff.cuni.cz Abstract 1 Interconnect quality

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

There are a number of factors that increase the risk of performance problems in complex computer and software systems, such as e-commerce systems.

There are a number of factors that increase the risk of performance problems in complex computer and software systems, such as e-commerce systems. ASSURING PERFORMANCE IN E-COMMERCE SYSTEMS Dr. John Murphy Abstract Performance Assurance is a methodology that, when applied during the design and development cycle, will greatly increase the chances

More information

PHOTOGRAMMETRIC TECHNIQUES FOR MEASUREMENTS IN WOODWORKING INDUSTRY

PHOTOGRAMMETRIC TECHNIQUES FOR MEASUREMENTS IN WOODWORKING INDUSTRY PHOTOGRAMMETRIC TECHNIQUES FOR MEASUREMENTS IN WOODWORKING INDUSTRY V. Knyaz a, *, Yu. Visilter, S. Zheltov a State Research Institute for Aviation System (GosNIIAS), 7, Victorenko str., Moscow, Russia

More information

Parallels Virtuozzo Containers vs. VMware Virtual Infrastructure:

Parallels Virtuozzo Containers vs. VMware Virtual Infrastructure: Parallels Virtuozzo Containers vs. VMware Virtual Infrastructure: An Independent Architecture Comparison TABLE OF CONTENTS Introduction...3 A Tale of Two Virtualization Solutions...5 Part I: Density...5

More information

Eight Ways to Increase GPIB System Performance

Eight Ways to Increase GPIB System Performance Application Note 133 Eight Ways to Increase GPIB System Performance Amar Patel Introduction When building an automated measurement system, you can never have too much performance. Increasing performance

More information

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems November 3-6, 1999 in Cambridge Massachusetts, USA Characterizing the Performance of Dynamic Distribution

More information

The Piranha computer algebra system. introduction and implementation details

The Piranha computer algebra system. introduction and implementation details : introduction and implementation details Advanced Concepts Team European Space Agency (ESTEC) Course on Differential Equations and Computer Algebra Estella, Spain October 29-30, 2010 Outline A Brief Overview

More information

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1 System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

The Methodology of Application Development for Hybrid Architectures

The Methodology of Application Development for Hybrid Architectures Computer Technology and Application 4 (2013) 543-547 D DAVID PUBLISHING The Methodology of Application Development for Hybrid Architectures Vladimir Orekhov, Alexander Bogdanov and Vladimir Gaiduchok Department

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information

Microcontroller-based experiments for a control systems course in electrical engineering technology

Microcontroller-based experiments for a control systems course in electrical engineering technology Microcontroller-based experiments for a control systems course in electrical engineering technology Albert Lozano-Nieto Penn State University, Wilkes-Barre Campus, Lehman, PA, USA E-mail: AXL17@psu.edu

More information

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age. Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement

More information

A Cloud Computing Approach for Big DInSAR Data Processing

A Cloud Computing Approach for Big DInSAR Data Processing A Cloud Computing Approach for Big DInSAR Data Processing through the P-SBAS Algorithm Zinno I. 1, Elefante S. 1, Mossucca L. 2, De Luca C. 1,3, Manunta M. 1, Terzo O. 2, Lanari R. 1, Casu F. 1 (1) IREA

More information

MEng, BSc Applied Computer Science

MEng, BSc Applied Computer Science School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions

More information

White Paper February 2010. IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario

White Paper February 2010. IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario White Paper February 2010 IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario 2 Contents 5 Overview of InfoSphere DataStage 7 Benchmark Scenario Main Workload

More information