Analyse de performances pour les systèmes intégrés multi-cœurs

Size: px
Start display at page:

Download "Analyse de performances pour les systèmes intégrés multi-cœurs"

Transcription

1 Ben Salma SANA Laboratoire TIMA SLS Analyse de performances pour les systèmes intégrés multi-cœurs Encadrants: Frederic Petrot Nicolas Fournel Page 1

2 1 - Introduction: Computer processor design has evolved at a constant pace for the last 20 years. The proliferation of computers into the mass market and the tasks we ask of them continue to push the need for more powerful processors. The market requirement for higher performing processors is linked to the demand for more sophisticated software applications. , for instance, which is now used globally, was only a limited and expensive technology 10 years ago. Today, software applications span everything from helping large corporations better manage and protect their business-critical data and networks to allowing PCs in the home to edit home videos, manipulate digital photographs, and burn downloaded music to CDs. Multi-core processors represent a major evolution in computing technology. This important development is coming at a time when businesses and consumers are beginning to require the benefits offered by these processors due to the exponential growth of digital data and the globalization of the Internet. Multi-core processors will eventually become the pervasive computing model because they offer performance and productivity benefits beyond the capabilities of today s single-core processors. 2 Context: Benchmarks are necessary to experimentally determine the benefits of new designs. The presence of parallel machines is now ubiquitous. Nevertheless, existing benchmarks have a number of limitations and will not be very useful to evaluate future CMPs. The lack of benchmark suites can hamper parallel architecture research and reduce its impact. The Princeton Application Repository for Shared-Memory Computers (PARSEC) was created to be useful for the design of next generation of processors. Parsec is benchmark suite for chip multiprocessor that started as cooperation between Intel and Princeton University. It offers new application areas for parallel machines through proliferation of CMPs. Parsec was used with many processors such as i386, x86_64 and Sparc but not with the ARM processors. During this semester, my TER project was in the research laboratory TIMA carrying out basic research in the field of Electronics: researches aim at the design, architecture, test, CAD, quality of integrated circuits and by means of data processing and microelectronics. My research project topics were to adapt Parsec benchmarks to ARM architecture throw a Cross Compilation and execute them on ARM. Page 2

3 3- What is Cross Compilation? A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is run. Cross compiler tools are used to generate executables for embedded system or multiple platforms. It is used to compile for a platform upon which it is not feasible to do the compiling, like microcontrollers that don't support an operating system. It has become more common to use this tool for paravirtualization where a system may have one or more platforms in use. Before doing the cross compilation I started with compiling and executing Benchmark Parsec on the local machine (Linux Intel) using the Gnu compiler gcc. Then I compiled the code with arm-none-linux-gnueabi-gcc to execute it finally on the ARM machine. 4 The benchmarks Parsec: 4.1. General Presentation: The Princeton Application Repository for Shared-Memory Computers is a benchmark suite composed of multithreaded programs. It differs from other benchmark suites by different characteristics: Multithreaded: PARSEC uses different ways of parallelization (threads, pipeline ). Diverse: PARSEC does try to explore different application domains such as financial analysis, Computer vision, animation, similarity search Not HPC-Focused: The PARSEC suite anticipates the development of parallel programs in the domain of High-Performance Computing and tries to avoid a program selection which is skewed towards HPC. It focuses on programs from all domains, such as desktop and server applications. Research: The suite is primarily intended for research. It can also be used for performance measurements of real machines, but its original purpose is insight, not numbers. Benchmark suites intended for research usually go beyond pure scoring systems and provide infrastructure to instrument and manipulate included programs in an efficient manner. Page 3

4 In the section bellow I will give deeper explanation of Parsec application starting with Blackscholes application Parsec applications: Blackscholes: The Black-Scholes equation is a differential equation that describes how, under a certain set of assumptions, the value of an option changes as the price of the underlying asset changes. Parallelization: The parallelization algorithms are very simple: they simply price multiple options in parallel using Black-Scholes formula. Each thread prices an individual option. In practice financial houses price 10.s to 100.s of thousands of options using Black-Scholes. Caractérisation des Blackscholes Bodytrack : The bodytrack computer vision application is an Intel RMS workload which tracks a 3D pose of a marker-less human body with multiple cameras through an image sequence. Bodytrack employs an annealed particle filter to track the pose using edges and the foreground silhouette as image features, based on a 10 segment 3D kinematic tree body model. These two image features were chosen because they exhibit a high degree of invariance under a wide range of conditions and because they are easy to extract. An annealed particle filter was employed in order to be able to search high dimensional configuration spaces without having to rely on any assumptions of the tracked body such Page 4

5 as the existence of markers or constrained movements. This benchmark was included due to the increasing significance of computer vision algorithms in areas such as video surveillance, character animation and computer interfaces. This work is based upon the following works : -Articulated Body Motion Capture by Stochastic Search Jonathan Deutscher and Ian Reid -A. Balan, L. Sigal and M. Black. A Quantitative Evaluation of video-based 3D Person Tracking. Output of Bodytrack (Frame 1) Parallelization: The program has three parallel kernels: Edge detection: bodytrack employs a gradient based edge detection mask to find edges. The result is compared against a threshold to eliminate spurious edges. Edge detection is implemented in function GradientMagThreshold. The output of this kernel will be further refined before it is used to compute the particle weights. Page 5

6 Edge smoothing: A separable Gaussian filter of size 7 7 pixels is used to smooth the edges in function GaussianBlur. The result is remapped between 0 and 1 to produce a pixel map in which the value of each pixel is related to its distance from an edge. The kernel has two parallel phases, one to filter image rows and one to filter image columns. Calculate particle weights: This kernel evaluates the foreground silhouette and the image edges produced earlier to compute the weights for the particles. This kernel is executed once for every annealing layer during every time step, making it the computationally most intensive part of the body tracker. The parallel kernels use tickets to distribute the work among threads balance the load dynamically. The ticketing mechanism is implemented in class TicketDispenser and behaves like a shared counter. Bodytrack characteristics Page 6

7 Canneal : This kernel was developed by Princeton University. It uses cache-aware simulated annealing to minimize the routing cost of a chip design. Simulated annealing is a common method to approximate the global optimum in a large search space. Canneal pseudo-randomly picks pairs of elements and tries to swap them. The program was included in the PARSEC program selection to represent engineering workloads, for the fine-grained parallelism with its lock-free synchronization techniques and due to its pseudo-random worst-case memory access pattern. Canneal uses a very aggressive synchronization strategy that is based on data race recovery instead of avoidance. Pointers to the elements are dereferenced and swapped atomically, but no locks are held while a potential swap is evaluated. This can cause disadvantageous swaps if one of the relevant elements has been replaced by another thread during that time. This equals a higher effective probability to accept swaps which increase the routing cost, and the simulated annealing method automatically recovers from it. The swap operation employs lock-free synchronization which is implemented with atomic instructions. An alternative implementation which relied on conventional locks turned out to be too inefficient due to excessive locking overhead. The synchronization routines with the atomic instructions are taken from the BSD kernel. Support for most new architectures can be added easily by copying the correct header file from the BSD kernel sources. Canneal characteristics Page 7

8 Dedup: The dedup kernel was developed by Princeton University. It compresses a data stream with a combination of global compression and local compression in order to achieve high compression ratios. Such a compression is called deduplication. The reason for the inclusion of this kernel is that deduplication has become a mainstream method to compress storage footprints for new-generation backup storage systems and to compress communication data for newgeneration bandwidth optimized networking appliances. The kernel uses a pipelined programming model to parallelize the compression to mimic real-world implementations. Parallelization: There are five pipeline stages the intermediate three of which are parallel. In the first stage, dedup reads the input stream and breaks it up into coarse-grained chunks to get independent work units for the threads. The second stage anchors each chunk into fine-grained small segments with rolling fingerprinting. The third pipeline stage computes a hash value for each data segment. The fourth stage compresses each data segment with the Ziv-Lempel algorithm and builds a global hash table that maps hash values to data. The final stage assembles the deduplicated output stream consisting of hash values and compressed data segments. Dedup uses a separate thread pool for each parallel pipeline stage. Each thread pool should at least have a number of threads equal to the number of available cores to allow the system to fully work on any stage should the need arise. The operating system scheduler is responsible for a thread schedule which will maximize the overall throughput of the pipeline. In order to avoid lock contention, the number of queues is scaled with the number of threads, with a small group of threads sharing an input and output queue at a time. Dedup characteristics Page 8

9 Facesim: This Intel RMS application was originally developed by Stanford University. It takes a model of a human face and a time sequence of muscle activations and computes a visually realistic animation of the modeled face by simulating the underlying physics. The goal is to create a visually realistic result. Certain effects such as inertial movements would have only a small visible effect and are not simulated. The workload was included in the benchmark suite because an increasing number of computer games and other forms of animation employ physical simulation to create more realistic virtual environment. Human faces in particular are observed with more attention from users than other details of a virtual world, making their realistic presentation a key element for animations. The parallelization uses a static partitioning of the mesh. Data that spans nodes belonging to more than one partition is replicated. Every time step the partitions process all elements that contain at least one node owned by the particle, but only results for nodes which are owned by the partition are written. Facesim characteristics Page 9

10 Ferret: This application is based on the Ferret toolkit which is used for content-based similarity search of feature-rich data such as audio, images, video, 3D shapes and so on. It was developed by Princeton University. The reason for the inclusion in the benchmark is that it represents emerging next generation desktop and Internet search engines for non-text document data types. In the benchmark, we have configured the Ferret toolkit for image similarity search. Ferret is parallelized using the pipeline model with six stages. The first and the last stage are for input and output. The middle four stages are for query image segmentation, feature extraction, indexing of candidate sets with multi-probe Locality Sensitive Hashing (LSH) and ranking. Each stage has its own thread pool and the basic work unit of the pipeline is a query image. Ferret characteristics Page 10

11 Fluidanimate: This Intel RMS application uses an extension of the Smoothed Particle Hydrodynamics (SPH) method to simulate an incompressible fluid for interactive animation purposes. Its output can be visualized by detecting and rendering the surface of the fluid. The force density fields are derived directly from the Navier-Stokes equation. Fluidanimate uses special-purpose kernels to increase stability and speed. Fluidanimate was included in the PARSEC benchmark suite because of the increasing significance of physics simulations for computer games and other forms of real-time animations. The parallelization algorithm provided is based on spatial partitioning. As mentioned above, particles are sorted spatially into a uniform grid which covers the entire simulation domain. This grid is evenly partitioned along cell boundaries using axis-aligned splitting planes to produce subgrids, one per thread. Since particles which reside in adjacent cells may interact with each other, multiple threads may need to update the cells which lie on sub-grid boundaries. Page 11

12 Each grid cell has a flag which indicates whether or not it lies on a subgrid boundary. Whenever a particle in a boundary cell needs to be updated, fine-grain locks are used for that update. Locks are not used when updating particles away from boundaries since these updates will never occur concurrently. Fluidanimate characteristics Freqmine: Frequent Itemsets Mining (FIM) is the basis of Association Rule Mining. Association Rule Mining is the process of analyzing a set of transactions to extract association rules. Association Rule Mining is a very common used and well-studied data mining problem. The mining is applicable to any sequential and time series data via discretization. Example domains are protein sequences, market data, web logs, text, music, stock market, etc. To mine Association Rule Mining is converted to mine the frequent itemsets Lk, which contains the frequent itemsets of length k. Many FIMI (FIM Implementation) algorithms have been proposed in the literature, including FP-growth and Apriori based approaches. Page 12

13 Researches showed that the FP-growth can get much faster than some old algorithms like the Apriori based approaches except in some cases the FP-tree can be too large to be stored in memory when the database size is so large or the database is too sparse. Freqmine has been parallelized with OpenMP. It employs three parallel kernels: Build FP-tree header, Construct prefix tree and Mine data. Freqmine characteristics Page 13

14 Streamcluster: This RMS kernel was developed by Princeton University and solves the online clustering problem: For a stream of input points, it finds a predetermined number of medians so that each point is assigned to its nearest center. The quality of the clustering is measured by the sum of squared distances (SSQ) metric. Stream clustering is a common operation where large amounts or continuously produced data has to be organized under real-time conditions, for example network intrusion detection, pattern recognition and data mining. The program spends most of its time evaluating the gain of opening a new center. This operation uses a parallelization scheme which employs static partitioning of data points. The program is memory bound for low-dimensional data and becomes increasingly computationally intensive as the dimensionality increases. Due to its online character the working set size of the algorithm can be chosen independently from the input data. Streamcluster was included in the PARSEC benchmark suite because of the importance of data mining algorithms and the prevalence of problems with streaming characteristics. Streamcluster characteristics Page 14

15 Swaption: The swaptions application is an Intel RMS workload which uses the Heath-Jarrow-Morton (HJM) framework to price a portfolio of swaptions. The HJM framework describes how interest rates evolve for risk management and asset liability management for a class of models. Its central insight is that there is an explicit relationship between the drift and volatility parameters of the forward-rate dynamics in a noarbitrage market. Because HJM models are non-markovian the analytic approach of solving the PDE to price a derivative cannot be used. Swaptions therefore employs Monte Carlo (MC) simulation to compute the prices. The workload was included in the benchmark suite because of the significance of PDEs and the wide use of Monte Carlo simulation. The program stores the portfolio in the swaptions array. Each entry corresponds to one derivative. Swaptions partitions the array into a number of blocks equal to the number of threads and assigns one block to every thread. Each thread iterates through all swaptions in the work unit it was assigned and calls the function HJM Swaption Blocking for every entry in order to compute the price. Swaption characteristics Vips: This application is based on the VASARI Image Processing System (VIPS) which was originally developed through several projects funded by European Union (EU) grants. The benchmark version is derived from a print on demand service that is offered at the National Gallery of London, which is also the current maintainer of the system. Page 15

16 The benchmark includes fundamental image operations such as an affine transformation and a convolution. It was chosen because image transformations are a common task on desktop computers and for the ability of the VASARI system to construct multi-threaded image processing pipelines transparently on the fly. Future libraries might use concepts such as the ones employed by VIPS to make multi-threaded functionality available to the user. The image transformation pipeline of the vips benchmark has 18 stages. It is implemented in the VIPS operation in benchmark. Vips characteristics Page 16

17 x264: The x264 application is an H.264/AVC (Advanced Video Coding) video encoder. In the 4th annual video codec comparison it was ranked 2nd best codec for its high encoding quality. It is based on the ITU-T H.264 standard which was completed in May 2003 and which is now also part of ISO/IEC MPEG-4. In that context the standard is also known as MPEG-4 Part 10. H.264 describes the lossy compression of a video stream. It improves over previous video encoding standards with new features such as increased sample bit depth precision, higherresolution color information, variable block-size motion compensation (VBSMC) or contextadaptive binary arithmetic coding (CABAC). These advancements allow H.264 encoders to achieve a higher output quality with a lower bit-rate at the expense of a significantly increased encoding and decoding time. The flexibility of H.264 allows its use in a wide range of contexts with different requirements, from video conferencing solutions to high-definition (HD) movie distribution. Next generation HD DVD or Blu-ray video players already require H.264/AVC encoding. The flexibility and wide range of application of the H.264 standard and its ubiquity in next generation video systems are the reasons for the inclusion of x264 in the PARSEC benchmark suite. The parallel algorithm of x264 uses the pipeline model with one stage per input video frame. This results in a virtual pipeline with as many stages as there are input frames. x264 processes a number of pipeline stages equal to the number of encoder threads in parallel, resulting in a sliding window which moves from the beginning of the pipeline to its end. For P- and B-Frames the encoder requires the image data and motion vectors from the relevant region of the reference frames in order to encode the current frame, and so each stage makes this information available as it is calculated during the encoding process. Fast upward movements can thus cause delays which can limit the achievable speedup of x264 in practice. In order to compensate for this effect, the parallelization model requires that x264 is executed with a number of threads greater than the number of cores to achieve maximum performance. Page 17

18 X264 characteristics Raytrace: Raytracing is a technique that generates a visually realistic image by tracing the path of light through a scene. It leverages the physical property that the path of light is always reversible to reduce the computational requirements by following the light rays from the eye point through the image plane to the source of the light. This way only light rays that contribute to the image are considered. The raytrace benchmark program uses a variety of the raytracing method that would typically be employed for real-time animations such as computer games. It is optimized for speed rather than realism. The computational complexity of the algorithm depends on the resolution of the output image and the scene. For more details about the specifics of benchmarks Parsec: Page 18

19 5. Project Milestones: In the beginning of my TER I started with reading documentations and trying to understand the goals of the subject. It was the most difficult part of the work. The subject treats new concepts that I didn t know and which are varied and complex. First the understanding of what is a benchmark and why they are useful then I started to explore the Parsec applications and that was very difficult because they treat different domains and use varied types of parallelization. Trying to understand Parsec took the biggest part of my project and I know that I didn t really understand the real use of each benchmark it is so complicated and it takes more then a half day per week. Then I started to discover the cross compilation. I first began with compiling and executing the code on the local machine (Intel Linux) and add the necessary libraries. Finally I made the cross compilation of the code using arm-none-linux-gnueabi-gcc and then simulate it on a four core ARM machine. I tried to execute all the benchmarks but I still have some benchmarks that didn t yet work on the ARM machine. The last step of the project was to analyze performance of ARM with benchmarks Parsec but I didn t have time to make this step. Analyze needs a good comprehension of each benchmark and its inputs (the size of benchmarks parsec is about 6 Go). 6. Conclusion: This project was very useful for me. I learned a lot of new concepts: what is Cross Compilation why we use it and how to cross compile a program for a specific architecture; it also improved my knowledge about scripting. Then I learned about the use of benchmarks and how important they are for the analysis of the performance of an architecture and how it gives the possibility of comparing the performance of different architecture. It was also very hard because benchmark Parsec use many technologies in many domains such as financial analysis, media processing, animation, visualization I find a big difficulty in understanding the use of each benchmark so I spent a lot of time on reading specifications. References: The site of Parsec: Page 19

PARSEC vs. SPLASH 2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip Multiprocessors

PARSEC vs. SPLASH 2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip Multiprocessors PARSEC vs. SPLASH 2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip Multiprocessors ChristianBienia,SanjeevKumar andkaili DepartmentofComputerScience,PrincetonUniversity MicroprocessorTechnologyLabs,Intel

More information

18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two age 1 18-742 Lecture 4 arallel rogramming II Spring 2005 rof. Babak Falsafi http://www.ece.cmu.edu/~ece742 write X Memory send X Memory read X Memory Slides developed in part by rofs. Adve, Falsafi, Hill,

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Performance Analysis and Comparison of JM 15.1 and Intel IPP H.264 Encoder and Decoder

Performance Analysis and Comparison of JM 15.1 and Intel IPP H.264 Encoder and Decoder Performance Analysis and Comparison of 15.1 and H.264 Encoder and Decoder K.V.Suchethan Swaroop and K.R.Rao, IEEE Fellow Department of Electrical Engineering, University of Texas at Arlington Arlington,

More information

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries First Semester Development 1A On completion of this subject students will be able to apply basic programming and problem solving skills in a 3 rd generation object-oriented programming language (such as

More information

Everything you need to know about flash storage performance

Everything you need to know about flash storage performance Everything you need to know about flash storage performance The unique characteristics of flash make performance validation testing immensely challenging and critically important; follow these best practices

More information

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden smakadir@csc.kth.se,

More information

Introduction to DISC and Hadoop

Introduction to DISC and Hadoop Introduction to DISC and Hadoop Alice E. Fischer April 24, 2009 Alice E. Fischer DISC... 1/20 1 2 History Hadoop provides a three-layer paradigm Alice E. Fischer DISC... 2/20 Parallel Computing Past and

More information

Managing Performance vs. Accuracy Trade-offs With Loop Perforation

Managing Performance vs. Accuracy Trade-offs With Loop Perforation Managing Performance vs. Accuracy Trade-offs With Loop Perforation Stelios Sidiroglou Sasa Misailovic Henry Hoffmann Martin Rinard Computer Science and Artificial Intelligence Laboratory Massachusetts

More information

Emerging Markets for H.264 Video Encoding

Emerging Markets for H.264 Video Encoding Leveraging High Definition and Efficient IP Networking WHITE PAPER Introduction Already dominant in traditional applications such as video conferencing and TV broadcasting, H.264 Advanced Video Coding

More information

Chapter 11 I/O Management and Disk Scheduling

Chapter 11 I/O Management and Disk Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization

More information

How To Build A Cloud Computer

How To Build A Cloud Computer Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology

More information

Image Compression through DCT and Huffman Coding Technique

Image Compression through DCT and Huffman Coding Technique International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

More information

Multi-GPU Load Balancing for Simulation and Rendering

Multi-GPU Load Balancing for Simulation and Rendering Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General

More information

White paper. H.264 video compression standard. New possibilities within video surveillance.

White paper. H.264 video compression standard. New possibilities within video surveillance. White paper H.264 video compression standard. New possibilities within video surveillance. Table of contents 1. Introduction 3 2. Development of H.264 3 3. How video compression works 4 4. H.264 profiles

More information

Parallelism and Cloud Computing

Parallelism and Cloud Computing Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication

More information

A Deduplication File System & Course Review

A Deduplication File System & Course Review A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror

More information

Develop Computer Animation

Develop Computer Animation Name: Block: A. Introduction 1. Animation simulation of movement created by rapidly displaying images or frames. Relies on persistence of vision the way our eyes retain images for a split second longer

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Chirag Gupta,Sumod Mohan K cgupta@clemson.edu, sumodm@clemson.edu Abstract In this project we propose a method to improve

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Reducing Backups with Data Deduplication

Reducing Backups with Data Deduplication The Essentials Series: New Techniques for Creating Better Backups Reducing Backups with Data Deduplication sponsored by by Eric Beehler Reducing Backups with Data Deduplication... 1 Explaining Data Deduplication...

More information

The Data Center as a Grid Load Stabilizer

The Data Center as a Grid Load Stabilizer The Data Center as a Grid Load Stabilizer Hao Chen *, Michael C. Caramanis ** and Ayse K. Coskun * * Department of Electrical and Computer Engineering ** Division of Systems Engineering Boston University

More information

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,

More information

Manual for Human Tracking Software

Manual for Human Tracking Software Manual for Human Tracking Software Alexandru Balan Department of Computer Science Brown University Providence, Rhode Island 02912 alb@cs.brown.edu Version 1.0 November 26, 2005 1. Abstract A novel human

More information

SQL Server 2012 Performance White Paper

SQL Server 2012 Performance White Paper Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format

Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format , pp.91-100 http://dx.doi.org/10.14257/ijhit.2014.7.4.09 Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format Jingjing Zheng 1,* and Ting Wang 1, 2 1,* Parallel Software and Computational

More information

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics 22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC

More information

CS231M Project Report - Automated Real-Time Face Tracking and Blending

CS231M Project Report - Automated Real-Time Face Tracking and Blending CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, slee2010@stanford.edu June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

Video Analytics A New Standard

Video Analytics A New Standard Benefits The system offers the following overall benefits: Tracker High quality tracking engine UDP s embedded intelligent Video Analytics software is fast becoming the standard for all surveillance and

More information

Visualizing molecular simulations

Visualizing molecular simulations Visualizing molecular simulations ChE210D Overview Visualization plays a very important role in molecular simulations: it enables us to develop physical intuition about the behavior of a system that is

More information

Master of Mathematical Finance: Course Descriptions

Master of Mathematical Finance: Course Descriptions Master of Mathematical Finance: Course Descriptions CS 522 Data Mining Computer Science This course provides continued exploration of data mining algorithms. More sophisticated algorithms such as support

More information

MMGD0203 Multimedia Design MMGD0203 MULTIMEDIA DESIGN. Chapter 3 Graphics and Animations

MMGD0203 Multimedia Design MMGD0203 MULTIMEDIA DESIGN. Chapter 3 Graphics and Animations MMGD0203 MULTIMEDIA DESIGN Chapter 3 Graphics and Animations 1 Topics: Definition of Graphics Why use Graphics? Graphics Categories Graphics Qualities File Formats Types of Graphics Graphic File Size Introduction

More information

How To Test Video Quality With Real Time Monitor

How To Test Video Quality With Real Time Monitor White Paper Real Time Monitoring Explained Video Clarity, Inc. 1566 La Pradera Dr Campbell, CA 95008 www.videoclarity.com 408-379-6952 Version 1.0 A Video Clarity White Paper page 1 of 7 Real Time Monitor

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

Data-parallel Acceleration of PARSEC Black-Scholes Benchmark

Data-parallel Acceleration of PARSEC Black-Scholes Benchmark Data-parallel Acceleration of PARSEC Black-Scholes Benchmark AUGUST ANDRÉN and PATRIK HAGERNÄS KTH Information and Communication Technology Bachelor of Science Thesis Stockholm, Sweden 2013 TRITA-ICT-EX-2013:158

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

Key Components of WAN Optimization Controller Functionality

Key Components of WAN Optimization Controller Functionality Key Components of WAN Optimization Controller Functionality Introduction and Goals One of the key challenges facing IT organizations relative to application and service delivery is ensuring that the applications

More information

: Application Layer. Factor the Content. Bernd Paysan. EuroForth 2011, Vienna

: Application Layer. Factor the Content. Bernd Paysan. EuroForth 2011, Vienna net : Application Layer Factor the Content Bernd Paysan EuroForth 2011, Vienna Outline Motivation Requirements Solutions Some Basic Insights Factor Data Distribute the Code net2o Recap: Lower Level Parts

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0 PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0 15 th January 2014 Al Chrosny Director, Software Engineering TreeAge Software, Inc. achrosny@treeage.com Andrew Munzer Director, Training and Customer

More information

Multi-core and Linux* Kernel

Multi-core and Linux* Kernel Multi-core and Linux* Kernel Suresh Siddha Intel Open Source Technology Center Abstract Semiconductor technological advances in the recent years have led to the inclusion of multiple CPU execution cores

More information

Video Coding Technologies and Standards: Now and Beyond

Video Coding Technologies and Standards: Now and Beyond Hitachi Review Vol. 55 (Mar. 2006) 11 Video Coding Technologies and Standards: Now and Beyond Tomokazu Murakami Hiroaki Ito Muneaki Yamaguchi Yuichiro Nakaya, Ph.D. OVERVIEW: Video coding technology compresses

More information

PARALLEL PROGRAMMING

PARALLEL PROGRAMMING PARALLEL PROGRAMMING TECHNIQUES AND APPLICATIONS USING NETWORKED WORKSTATIONS AND PARALLEL COMPUTERS 2nd Edition BARRY WILKINSON University of North Carolina at Charlotte Western Carolina University MICHAEL

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Parametric Comparison of H.264 with Existing Video Standards

Parametric Comparison of H.264 with Existing Video Standards Parametric Comparison of H.264 with Existing Video Standards Sumit Bhardwaj Department of Electronics and Communication Engineering Amity School of Engineering, Noida, Uttar Pradesh,INDIA Jyoti Bhardwaj

More information

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of flash Memory based Solid State Drives FAST 11 Yongseok Oh University of Seoul Mobile Embedded System Laboratory

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

The Feasibility of Supporting Large-Scale Live Streaming Applications with Dynamic Application End-Points

The Feasibility of Supporting Large-Scale Live Streaming Applications with Dynamic Application End-Points The Feasibility of Supporting Large-Scale Live Streaming Applications with Dynamic Application End-Points Kay Sripanidkulchai, Aditya Ganjam, Bruce Maggs, and Hui Zhang Instructor: Fabian Bustamante Presented

More information

Demystifying Deduplication for Backup with the Dell DR4000

Demystifying Deduplication for Backup with the Dell DR4000 Demystifying Deduplication for Backup with the Dell DR4000 This Dell Technical White Paper explains how deduplication with the DR4000 can help your organization save time, space, and money. John Bassett

More information

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow , pp.233-237 http://dx.doi.org/10.14257/astl.2014.51.53 A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow Giwoo Kim 1, Hye-Youn Lim 1 and Dae-Seong Kang 1, 1 Department of electronices

More information

H 261. Video Compression 1: H 261 Multimedia Systems (Module 4 Lesson 2) H 261 Coding Basics. Sources: Summary:

H 261. Video Compression 1: H 261 Multimedia Systems (Module 4 Lesson 2) H 261 Coding Basics. Sources: Summary: Video Compression : 6 Multimedia Systems (Module Lesson ) Summary: 6 Coding Compress color motion video into a low-rate bit stream at following resolutions: QCIF (76 x ) CIF ( x 88) Inter and Intra Frame

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

TPCalc : a throughput calculator for computer architecture studies

TPCalc : a throughput calculator for computer architecture studies TPCalc : a throughput calculator for computer architecture studies Pierre Michaud Stijn Eyerman Wouter Rogiest IRISA/INRIA Ghent University Ghent University pierre.michaud@inria.fr Stijn.Eyerman@elis.UGent.be

More information

Application Performance Analysis of the Cortex-A9 MPCore

Application Performance Analysis of the Cortex-A9 MPCore This project in ARM is in part funded by ICT-eMuCo, a European project supported under the Seventh Framework Programme (7FP) for research and technological development Application Performance Analysis

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

Grid Computing for Artificial Intelligence

Grid Computing for Artificial Intelligence Grid Computing for Artificial Intelligence J.M.P. van Waveren May 25th 2007 2007, Id Software, Inc. Abstract To show intelligent behavior in a First Person Shooter (FPS) game an Artificial Intelligence

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

SCALABILITY AND AVAILABILITY

SCALABILITY AND AVAILABILITY SCALABILITY AND AVAILABILITY Real Systems must be Scalable fast enough to handle the expected load and grow easily when the load grows Available available enough of the time Scalable Scale-up increase

More information

Mean-Shift Tracking with Random Sampling

Mean-Shift Tracking with Random Sampling 1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of

More information

Remote Graphical Visualization of Large Interactive Spatial Data

Remote Graphical Visualization of Large Interactive Spatial Data Remote Graphical Visualization of Large Interactive Spatial Data ComplexHPC Spring School 2011 International ComplexHPC Challenge Cristinel Mihai Mocan Computer Science Department Technical University

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

ReSense: Mapping Dynamic Workloads of Colocated Multithreaded Applications Using Resource Sensitivity

ReSense: Mapping Dynamic Workloads of Colocated Multithreaded Applications Using Resource Sensitivity ReSense: Mapping Dynamic Workloads of Colocated Multithreaded Applications Using Resource Sensitivity TANIMA DEY, WEI WANG, JACK W. DAVIDSON, and MARY LOU SOFFA, University of Virginia To utilize the full

More information

INTEL IPP REALISTIC RENDERING MOBILE PLATFORM SOFTWARE DEVELOPMENT KIT

INTEL IPP REALISTIC RENDERING MOBILE PLATFORM SOFTWARE DEVELOPMENT KIT INTEL IPP REALISTIC RENDERING MOBILE PLATFORM SOFTWARE DEVELOPMENT KIT Department of computer science and engineering, Sogang university 2008. 7. 22 Deukhyun Cha INTEL PERFORMANCE LIBRARY: INTEGRATED PERFORMANCE

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012 MapReduce and Hadoop Aaron Birkland Cornell Center for Advanced Computing January 2012 Motivation Simple programming model for Big Data Distributed, parallel but hides this Established success at petabyte

More information

Doctor of Philosophy in Computer Science

Doctor of Philosophy in Computer Science Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

Internet Firewall CSIS 4222. Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS 4222. net15 1. Routers can implement packet filtering

Internet Firewall CSIS 4222. Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS 4222. net15 1. Routers can implement packet filtering Internet Firewall CSIS 4222 A combination of hardware and software that isolates an organization s internal network from the Internet at large Ch 27: Internet Routing Ch 30: Packet filtering & firewalls

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

LONG BEACH CITY COLLEGE MEMORANDUM

LONG BEACH CITY COLLEGE MEMORANDUM LONG BEACH CITY COLLEGE MEMORANDUM DATE: May 5, 2000 TO: Academic Senate Equivalency Committee FROM: John Hugunin Department Head for CBIS SUBJECT: Equivalency statement for Computer Science Instructor

More information

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

White Paper. How Streaming Data Analytics Enables Real-Time Decisions White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream

More information

Hardware design for ray tracing

Hardware design for ray tracing Hardware design for ray tracing Jae-sung Yoon Introduction Realtime ray tracing performance has recently been achieved even on single CPU. [Wald et al. 2001, 2002, 2004] However, higher resolutions, complex

More information

Titolo del paragrafo. Titolo del documento - Sottotitolo documento The Benefits of Pushing Real-Time Market Data via a Web Infrastructure

Titolo del paragrafo. Titolo del documento - Sottotitolo documento The Benefits of Pushing Real-Time Market Data via a Web Infrastructure 1 Alessandro Alinone Agenda Introduction Push Technology: definition, typology, history, early failures Lightstreamer: 3rd Generation architecture, true-push Client-side push technology (Browser client,

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...

More information

NAPCS Product List for NAICS 51219: Post Production Services and Other Motion Picture and Video Industries

NAPCS Product List for NAICS 51219: Post Production Services and Other Motion Picture and Video Industries National 51219 1 Postproduction Providing computerized and electronic image and sound processing (film, video, digital media, etc.). Includes editing, transfer, color correction, digital restoration, visual

More information

Broadband Networks. Prof. Karandikar. Department of Electrical Engineering. Indian Institute of Technology, Bombay. Lecture - 26

Broadband Networks. Prof. Karandikar. Department of Electrical Engineering. Indian Institute of Technology, Bombay. Lecture - 26 Broadband Networks Prof. Karandikar Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 26 Optical Network &MPLS So, as you were discussing in the previous lectures, next

More information

Analecta Vol. 8, No. 2 ISSN 2064-7964

Analecta Vol. 8, No. 2 ISSN 2064-7964 EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

PERFORMANCE MODELS FOR APACHE ACCUMULO:

PERFORMANCE MODELS FOR APACHE ACCUMULO: Securely explore your data PERFORMANCE MODELS FOR APACHE ACCUMULO: THE HEAVY TAIL OF A SHAREDNOTHING ARCHITECTURE Chris McCubbin Director of Data Science Sqrrl Data, Inc. I M NOT ADAM FUCHS But perhaps

More information

A Deduplication-based Data Archiving System

A Deduplication-based Data Archiving System 2012 International Conference on Image, Vision and Computing (ICIVC 2012) IPCSIT vol. 50 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V50.20 A Deduplication-based Data Archiving System

More information

Video compression: Performance of available codec software

Video compression: Performance of available codec software Video compression: Performance of available codec software Introduction. Digital Video A digital video is a collection of images presented sequentially to produce the effect of continuous motion. It takes

More information

Understanding Compression Technologies for HD and Megapixel Surveillance

Understanding Compression Technologies for HD and Megapixel Surveillance When the security industry began the transition from using VHS tapes to hard disks for video surveillance storage, the question of how to compress and store video became a top consideration for video surveillance

More information

evm Virtualization Platform for Windows

evm Virtualization Platform for Windows B A C K G R O U N D E R evm Virtualization Platform for Windows Host your Embedded OS and Windows on a Single Hardware Platform using Intel Virtualization Technology April, 2008 TenAsys Corporation 1400

More information

CSE 237A Final Project Final Report

CSE 237A Final Project Final Report CSE 237A Final Project Final Report Multi-way video conferencing system over 802.11 wireless network Motivation Yanhua Mao and Shan Yan The latest technology trends in personal mobile computing are towards

More information

Interactive Multimedia Courses-1

Interactive Multimedia Courses-1 Interactive Multimedia Courses-1 IMM 110/Introduction to Digital Media An introduction to digital media for interactive multimedia through the study of state-of-the-art methods of creating digital media:

More information

Discovering Computers 2008. Chapter 3 Application Software

Discovering Computers 2008. Chapter 3 Application Software Discovering Computers 2008 Chapter 3 Application Software Chapter 3 Objectives Identify the categories of application software Explain ways software is distributed Explain how to work with application

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department

More information

Performance Analysis of Thread Mappings with a Holistic View of the Hardware Resources

Performance Analysis of Thread Mappings with a Holistic View of the Hardware Resources Performance Analysis of Thread Mappings with a Holistic View of the Hardware Resources Wei Wang, Tanima Dey, Jason Mars, Lingjia Tang, Jack Davidson, Mary Lou Soffa Department of Computer Science University

More information

White Paper. Recording Server Virtualization

White Paper. Recording Server Virtualization White Paper Recording Server Virtualization Prepared by: Mike Sherwood, Senior Solutions Engineer Milestone Systems 23 March 2011 Table of Contents Introduction... 3 Target audience and white paper purpose...

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information