supernova, a multiprocessor-aware synthesis server for SuperCollider
|
|
- Sherman Cannon
- 2 years ago
- Views:
Transcription
1 supernova, a multiprocessor-aware synthesis server for SuperCollider Tim BLECHMANN Vienna, Austria Abstract SuperCollider [McCartney, 1996] is a modular computer music system, based on an object-oriented real-time scripting language and a standalone synthesis server. supernova is a new implementation of the SuperCollider synthesis server, providing an extension for multi-threaded signal processing. With adding one class to the SuperCollider class library, the parallel signal processing capabilities are exposed to the user. Keywords SuperCollider, multi-processor, real-time 1 Introduction In the last few years, multi- and many-core computer architectures nearly replaced single-core CPUs. Increasing the single-core performance requires a higher power consumption, which would lower the performance per watt. The computer industry therefore concentrated on increasing the number of CPU cores instead of single-core performance. These days, most mobile computers use dual-core processors, while workstations use up to quad-core processors with support for Simultaneous Multithreading. Most computer-music applications still use a single-threaded programming model for the signal processing. Although multi-processor aware signal processing was pioneered in the late 1980s, most notably with IRCAM s ISPW [Puckette, 1991], most systems, that are commonly used these days, have limited support for parallel signal processing. Recently, some systems introduced limited multi-processor capabilities, e.g. PureData, using a static pipelining technique similar to the ISPW [Puckette, 2008], adding a delay of one block size (usually 64 samples) to transfer data between processors. Jack2 [Letz et al., 2005] can execute different clients in parallel, so one can manually distribute the signal processing load to different applications. It can also be used to control multiple SuperCollider server instances from the same language process. This paper is divided into the following sections. Section 2 gives an introduction to the difficulties when parallelizing computer music applications, Section 3 describes the general architecture of SuperCollider and the programming model of a SuperCollider signal graph. Section 4 introduces the concept of parallel groups to provide multi-processor support for SuperCollider signal graphs. Section 5 gives an overview of the supernova architecture, Section 6 describes the current limitations. 2 Parallelizing Computer Music Systems Parallelizing computer music systems is not a trivial task, because of several constraints. 2.1 Signal Graphs In general, signal graphs are data-flow graphs, a form of directed acyclic graphs (DAGs), where each node does some signal processing, based on its predecessors. When trying to parallelize a signal graph, two aspects have to be considered. Signal graphs may have a huge number of nodes, easily reaching several thousand nodes. Traversing a huge graph in parallel is possible, but since the nodes are usually small building blocks ( unit generators ), processing only a few samples, the synchronization overhead would outweigh the speedup of parallelizing the traversal. To cope with the scheduling overhead, Ulrich Reiter and Andreas Partzsch developed an algorithm for clustering a huge node graph for a certain number of threads [Reiter and Partzsch, 2007]. While in theory the only ordering constraint between graph nodes is the graph order (explicit order), many computer music systems introduce an implicit order, which is defined by the order, in which nodes access shared resources. This implicit order changes
2 with the algorithm, that is used for traversing the graph and is undefined for a parallel graph traversal. Implicit ordering constraints make it especially difficult to parallelize signal graphs of max-like [Puckette, 2002] computer music systems. 2.2 Realtime Constraints Computer Music Systems are realtime systems with low-latency constraints. The most commonly used audio driver APIs use a Pull Model 1, that means, the driver calls a certain callback function at a monotonic rate, when the audio device provides and needs new data. The audio callback is a realtime function. If it doesn t meet the deadline and the audio data is delivered too late, a buffer under-run occurs, resulting in an audio dropout. The deadline itself depends on the driver settings, especially on the block size, which is typically between 64 and 2048 samples (roughly between 1 and 80 milliseconds). For low-latency applications like computer-based instruments playback latency below 20 ms are desired [Magnusson, 2006], for processing percussion instruments round-trip latencies for below 10 ms are required to ensure that the sound is perceived as single event. Additional latencies may be introduced by buffering in hardware and digital/analog conversion. In order to match the low-latency real-time constraints, a number of issues has to be dealt with. Most importantly, it is not allowed to block the realtime thread, which could happen when using blocking data structures for synchronization or allocating memory from the operating system. 3 SuperCollider supernova is designed to be integrated in the SuperCollider system. This Section gives an overview about the parts of SuperCollider s architecture, that are relevant for parallel signal processing. 3.1 SuperCollider Architecture SuperCollider in its current version 3 [McCartney, 2002] is designed as a very modular system, consisting of the following parts (compare Figure 3.1): Language (sclang) SuperCollider is based on an object-oriented scripting language with model 1 e.g. Jack, CoreAudio, ASIO, Portaudio use a pull IDE pipe / c api sclang osc scsynth / supernova c api Unit Generators c api / osc GUI Figure 1: SuperCollider architecture a real-time garbage collector, which is inspired by Smalltalk. The language comes with a huge class library, specifically designed for computer music applications. It includes classes to control the synthesis server from the language. Synthesis Server (scsynth) The SuperCollider server is the signal processing engine. It is controlled from the Language using a simple OSC-based network interface. Unit Generators Unit Generators (building blocks for the signal processing like oscillators, filters etc.) are provided as plugins. These plugins are shared libraries with a C- based API, that are loaded into the server at boot time. IDE There are a number of integrated developer environments for SuperCollider source files. The original IDE is an OSX-only application, solely built as IDE for SuperCollider. Since it is OSX only, several alternatives exist for other operating systems, including editor modes for Emacs, Vim, gedit. Beside that, there is a plugin for Eclipse and a windows client called Psycollider. GUI To build a graphical user interface for SuperCollider, two solutions are widely used. On OSX, the application provides Cocoabased widgets. For other operating sys-
3 tems, one can use SwingOSC, which is based on Java s Swing widget toolkit. This modularity makes it easy to change some parts of the system, while keeping the functionality of other parts. It is possible to use the server from other systems or languages, or control multiple server instances from one language process. Users can extend the language by adding new classes or class methods or write new unit generators (ugens). 3.2 SuperCollider Node Graph The synthesis graph of the SuperCollider server is directly exposed to the language as a node graph. A node can be either a synth, i.e. a synthesis entity, performing the signal processing, or a group, representing a list of nodes. So, the node graph is internally represented as a tree, with groups at its root and inner leaves. Inside a group, nodes are executed sequentially, the node ordering is exposed to the user. Nodes can be instantiated at a certain position inside a group or moved to a different location. 4 Multi-core support with supernova supernova is designed as replacement for the SuperCollider server scsynth. It implements the OSC interface of scsynth and can dynamically load unit generators. Unit Generators for scsynth and supernova are not completely source compatible, though the API changes are very limited, so porting SuperCollider ugens to supernova is trivial. The API difference is described in detail in Section Parallel Groups To make use of supernova s parallel execution engine, the concept of parallel groups has been introduced. The idea is, that all nodes inside such a parallel group can be executed concurrently. Parallel groups can be nested, elements can be other nodes (synths, groups and parallel groups). The concept of parallel groups has been implemented in the SuperCollider class library with a new PGroup class. Its interface is close to the interface of the Group class, but uses parallel groups on the server. Parallel groups can be safely emulated with traditional groups, making it easy to run code using the PGroup extension on the SuperCollider server. 4.2 Shared Resources When writing SuperCollider code with the PGroup extension, the access to shared resources (i.e. buses and buffers) needs some attention to avoid data races. While with nodes in sequential groups the access to shared resources is ordered, this ordering constraint does not exist for parallel groups. E.g. if a sequential group G contains the synths S1 and S2, which both write to the bus B, S1 writes to B before S2, if and only if S1 is located before S2. If they are placed in a parallel group P, it is undetermined, whether S1 writes to B before or after S2. This may be an issue, if the order of execution affects the result signal. Summing buses would be immune to this issue, though. Shared resources are synchronized at ugen level in order to assure data consistency. It is possible to read the same resource from different unit generators concurrently, but only one unit generator can write to a resource at the same time. If the order of resource access matters, the user is responsible to take care of the correct execution order. 5 supernova Architecture The architecture of supernova is not too different from scsynth, except for the node graph interpreter. It is reimplemented from scratch in the c++ programming language, making heavy use of the stl, the boost libraries and template programming techniques. This Section covers the design aspects, that differ between the SuperCollider server and supernova. 5.1 DSP Threads The multi-threaded dsp engine of supernova consists of the main audio callback thread and a number of worker threads, running with a realtime scheduling policy similar to the main audio thread. To synchronize these worker threads, the synchronization constraints, that are usually imposed (see Section 2.2), have to be relaxed. Instead of not using any blocking synchronization primitives at all, the dsp threads are allowed to use spin locks, that are blocking, but don t suspend the calling thread for synchronizing with other realtime threads. Since this reduces the parts of the program, that can run in parallel, this should be avoided whenever it is possible to use lock-free data structures for synchronization. 5.2 Node Scheduling At startup supernova creates a configurable number of worker threads. When the main audio callback is triggered, it wakes the worker threads. All audio threads processes jobs from
4 a queue of runnable items. With each queue item, an activation count is associated, denoting the number of its predecessors. After an item has been executed, the activation counts of its successors are decremented and if it drops to zero, they are placed in the ready queue. The concept of using an activation count is similar to the implementation of Jack2 [Letz et al., 2005]. Items do not directly map to dsp graph nodes, since sequential graph nodes are combined to reduce the overhead for scheduling item. E.g. groups that only contain synths would be scheduled as one item. 5.3 Unit Generator Interface The plugin interface of SuperCollider is based on a simple c-style API. The API doesn t provide any support for resource synchronization, which has to be added to ensure the consistency of buffers and buses. supernova is therefore shipped with a patched version of the SuperCollider source tree, providing an API extension to cope with this limitation. With this extension, reader-writer locks for buses and buffers are introduced, that need to be acquired exclusively for write access. The impact for the ugen code is limited, as the extension is implemented with C preprocessor macros, that can easily be disabled. So the adapted unit generator code for supernova can easily be compiled for scsynth. In order to port a SuperCollider unit generator to supernova, the following rules have to be followed: ˆ ugens, that do not access any shared resources, are binary compatible. ˆ ugens, that are accessing buses have to guard write access with a pair of ACQUIRE_BUS_AUDIO and RELEASE_BUS_AUDIO statements and read access with ACQUIRE_BUS_AUDIO_SHARED and RELEASE_BUS_AUDIO_SHARED. Ugens like In, Out and their variations (e.g. ReplaceOut) fall in this category. ˆ ugens, that are accessing buffers (e.g. sampling, delay ugens ) have to use pairs of ACQUIRE_SNDBUF or ACQUIRE_SNDBUF_SHARED and RELEASE_SNDBUF or RELEASE_SNDBUF_SHARED. In addition to that, the LOCK_SNDBUF and LOCK_SNDBUF_SHARED can be used to create a RAII-style lock, that automatically unlocks the buffer, when running out of scope. ˆ To prevent deadlocks, one should not lock more than one resource type at the same time. When locking multiple resources of the same type, only resources with adjacent indices are allowed to be locked. The locking order has to be from low indices to high indices. 6 Limitations At the time of writing, the implementation of supernova still imposed a few limitations. 6.1 Missing Features The SuperCollider server provides some features, that haven t been implemented in supernova, yet. The most important feature, that is still missing is non-realtime synthesis, that is implemented by scsynth. In contrast to the realtime mode, OSC commands are not read from a network socket, but from a binary file, that is usually generated from the language. Audio IO is done via soundfiles instead of physical audio devices. Other missing features are ugen commands for specific commands to specific unit generators and plugin commands to add userdefined OSC handlers. While these features exist, they are neither used in the standard distribution nor in the sc3-plugins repository Synchronization and Timing As explained in Section 5.2 the audio callback wakes up the worker threads before working on the job queue. The worker threads are not put to sleep until all jobs for the current signal block have been run, in order to avoid some scheduling overhead. While this is reasonable, if there are enough parallel jobs for the worker threads, it leads to busy-waiting for sequential parts of the signal graph. If no parallel groups are used, the worker threads wouldn t do any useful work, or even worse, they would prevent the operating system to run threads with a lower priority on this specific CPU. This is neither very friendly to other tasks nor to the environment (power consumption). The user can make sure to parallelize the signal processing as much as possible and adapt the number of worker threads to 2 The sc3-plugin repository (http://sc3-plugins. sourceforge.net/) is an svn repository of third-party unit generators, that is maintained independently from SuperCollider
5 match the demands of the application and the number of available CPU cores. On systems with a very low worst-case scheduling latency, it would be possible to put the worker threads to sleep, instead of busywaiting for the remaining jobs to finish. On a highly tweaked linux system with the RT Preemption patches enabled, one should be able to get worst-case scheduling latencies below 20 microseconds [Rostedt and Hart, 2007], which would be acceptable, especially if a block size of more than 128 samples can be used. On my personal reference machines, worst-case scheduling latencies of 100 µs (Intel Core2 Laptop) and 12 µs (Intel i7 workstation, power management, frequency scaling and SMT disabled) can be achieved. 7 Conclusions supernova is a scalable solution for real-time computer music. It is tightly integrated with the SuperCollider computer music system, since it just replaces one of its components and adds the simple but powerful concept of parallel groups, exposing the parallelism explicitly to the user. While existing code doesn t make use of multiprocessor machines, it can easily be adapted. The advantage over other multiprocessor-aware computer music systems is its scalability. An application doesn t need to be optimized for a certain number of CPU cores, but can use as many cores as desired. It does not rely on pipelining techniques, so no latency is added to the signal. Resource access is not ordered implicitly, but has to be dealt with explicitly by the user. Nordic conference on Human-computer interaction, pages , New York, NY, USA. ACM. James McCartney SuperCollider, a new real time synthesis language. In Proceedings of the International Computer Music Conference. James McCartney Rethinking the Computer Music Language: SuperCollider. Computer Music Journal, 26(4): Miller Puckette FTS: A Real-time Monitor for Multiprocessor Music Synthesis. Computer Music Journal, 15(3): Miller Puckette Max at Seventeen. Computer Music Journal, 26(4): Miller Puckette Thoughts on Parallel Computing for Music. In Proceedings of the International Computer Music Conference. Ulrich Reiter and Andreas Partzsch Multi Core / Multi Thread Processing in Object Based Real Time Audio Rendering: Approaches and Solutions for an Optimization Problem. In Audio Engineering Society 122th Convention. Steven Rostedt and Darren V. Hart Internals of the RT Patch. In Proceedings of the Linux Symposium, pages Acknowledgements I would like to thank James McCartney for developing SuperCollider and publishing it as open source software, Anton Ertl for the valuable feedback about this paper, and all my friends, who supported me during the development of supernova. References Stéphane Letz, Yann Orlarey, and Dominique Fober Jack audio server for multiprocessor machines. In Proceedings of the International Computer Music Conference. Thor Magnusson Affordances and constraints in screen-based musical instruments. In NordiCHI 06: Proceedings of the 4th
jackdmp: Jack server for multi-processor machines Stéphane Letz, Yann Orlarey, Dominique Fober Grame, centre de création musicale Lyon, France
jackdmp: Jack server for multi-processor machines Stéphane Letz, Yann Orlarey, Dominique Fober Grame, centre de création musicale Lyon, France 1 Main objectives To take advantage of multi-processor architectures:
UNIT 1 OPERATING SYSTEM FOR PARALLEL COMPUTER
UNIT 1 OPERATING SYSTEM FOR PARALLEL COMPUTER Structure Page Nos. 1.0 Introduction 5 1.1 Objectives 5 1.2 Parallel Programming Environment Characteristics 6 1.3 Synchronisation Principles 1.3.1 Wait Protocol
Multi-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...
jackdmp: Jack server for multi-processor machines
jackdmp: Jack server for multi-processor machines S.Letz, D.Fober, Y.Orlarey Grame - Centre national de création musicale {letz, fober, orlarey}@grame.fr Abstract jackdmp is a C++ version of the Jack low-latency
SEMS: The SIP Express Media Server. FRAFOS GmbH
SEMS: The SIP Express Media Server FRAFOS GmbH Introduction The SIP Express Media Server (SEMS) is a VoIP media and application platform for SIP based VoIP services. SEMS offers a wide selection of media
From Faust to Web Audio: Compiling Faust to JavaScript using Emscripten
From Faust to Web Audio: Compiling Faust to JavaScript using Emscripten Myles Borins Center For Computer Research in Music and Acoustics Stanford University Stanford, California United States, mborins@ccrma.stanford.edu
EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications
ECE6102 Dependable Distribute Systems, Fall2010 EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications Deepal Jayasinghe, Hyojun Kim, Mohammad M. Hossain, Ali Payani
An Implementation Of Multiprocessor Linux
An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH
Equalizer Parallel OpenGL Application Framework Stefan Eilemann, Eyescale Software GmbH Outline Overview High-Performance Visualization Equalizer Competitive Environment Equalizer Features Scalability
Rackspace Cloud Databases and Container-based Virtualization
Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many
Benjamin San Souci & Maude Lemaire
Benjamin San Souci & Maude Lemaire introduction What is Node.js anyway? a complete software platform for scalable server-side and networking applications open-source under the MIT license comes bundled
Operating Systems 4 th Class
Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science
15-418 Final Project Report. Trading Platform Server
15-418 Final Project Report Yinghao Wang yinghaow@andrew.cmu.edu May 8, 214 Trading Platform Server Executive Summary The final project will implement a trading platform server that provides back-end support
Chapter 6, The Operating System Machine Level
Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General
evm Virtualization Platform for Windows
B A C K G R O U N D E R evm Virtualization Platform for Windows Host your Embedded OS and Windows on a Single Hardware Platform using Intel Virtualization Technology April, 2008 TenAsys Corporation 1400
CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015
CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 1. Goals and Overview 1. In this MP you will design a Dynamic Load Balancer architecture for a Distributed System 2. You will
White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux
White Paper Real-time Capabilities for Linux SGI REACT Real-Time for Linux Abstract This white paper describes the real-time capabilities provided by SGI REACT Real-Time for Linux. software. REACT enables
Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 26 Real - Time POSIX. (Contd.) Ok Good morning, so let us get
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Winter Term 2008 / 2009 Jun.-Prof. Dr. André Brinkmann Andre.Brinkmann@uni-paderborn.de Universität Paderborn PC² Agenda Multiprocessor and
Delivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Implementing AUTOSAR Scheduling and Resource Management on an Embedded SMT Processor
Implementing AUTOSAR Scheduling and Resource Management on an Embedded SMT Processor Florian Kluge, Chenglong Yu, Jörg Mische, Sascha Uhrig, Theo Ungerer University of Augsburg 12th International Workshop
Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings
Operatin g Systems: Internals and Design Principle s Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Bear in mind,
System Software and TinyAUTOSAR
System Software and TinyAUTOSAR Florian Kluge University of Augsburg, Germany parmerasa Dissemination Event, Barcelona, 2014-09-23 Overview parmerasa System Architecture Library RTE Implementations TinyIMA
Improving Scalability of OpenMP Applications on Multi-core Systems Using Large Page Support
Improving Scalability of OpenMP Applications on Multi-core Systems Using Large Page Support Ranjit Noronha and Dhabaleswar K. Panda Network Based Computing Laboratory (NBCL) The Ohio State University Outline
Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study
CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what
VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5
Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
Parallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
Resource Utilization of Middleware Components in Embedded Systems
Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system
Lecture 1 Introduction to Android
These slides are by Dr. Jaerock Kwon at. The original URL is http://kettering.jrkwon.com/sites/default/files/2011-2/ce-491/lecture/alecture-01.pdf so please use that instead of pointing to this local copy
Real Time Programming: Concepts
Real Time Programming: Concepts Radek Pelánek Plan at first we will study basic concepts related to real time programming then we will have a look at specific programming languages and study how they realize
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
OpenMosix Presented by Dr. Moshe Bar and MAASK [01]
OpenMosix Presented by Dr. Moshe Bar and MAASK [01] openmosix is a kernel extension for single-system image clustering. openmosix [24] is a tool for a Unix-like kernel, such as Linux, consisting of adaptive
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
GROUPWARE. Ifeoluwa Idowu
GROUPWARE Ifeoluwa Idowu GROUPWARE What is Groupware? Definitions of Groupware Computer-based systems that support groups of people engaged in a common task (or goal) and that provide an interface to a
Software and the Concurrency Revolution
Software and the Concurrency Revolution A: The world s fastest supercomputer, with up to 4 processors, 128MB RAM, 942 MFLOPS (peak). 2 Q: What is a 1984 Cray X-MP? (Or a fractional 2005 vintage Xbox )
GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications
GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102
Performance and Implementation Complexity in Multiprocessor Operating System Kernels
Performance and Implementation Complexity in Multiprocessor Operating System Kernels Simon Kågström Department of Systems and Software Engineering Blekinge Institute of Technology Ronneby, Sweden http://www.ipd.bth.se/ska
q for Gods Whitepaper Series (Edition 7) Common Design Principles for kdb+ Gateways
Series (Edition 7) Common Design Principles for kdb+ Gateways May 2013 Author: Michael McClintock joined First Derivatives in 2009 and has worked as a consultant on a range of kdb+ applications for hedge
Operating Systems for Embedded Computers
University of Zagreb Faculty of Electrical Engineering and Computing Department of Electronics, Microelectronics, Computer and Intelligent Systems Operating Systems for Embedded Computers Summary of textbook:
Advisor Counsel. Computer basics and Programming. Introduction to Engineering Design. C Programming Project. Digital Engineering
Course Description ( 전체개설교과목개요 ) Advisor Counsel Yr. : Sem. : Course Code: CD0001 Advisor in the department which programs engineering education guides certificate program educational objectives, learning
A Comparison of Distributed Systems: ChorusOS and Amoeba
A Comparison of Distributed Systems: ChorusOS and Amoeba Angelo Bertolli Prepared for MSIT 610 on October 27, 2004 University of Maryland University College Adelphi, Maryland United States of America Abstract.
Smedge Got a Render Farm? The Smedge that holds it all together
Got a Render Farm? Smedge 2010 The Smedge that holds it all together The next generation technology of Smedge is all you need to get your farm working for you without specialized training and without breaking
Optimizing Web Performance with TBB
Optimizing Web Performance with TBB Open Parallel Ltd lenz.gschwendtner@openparallel.com August 11, 2011 ii Open Parallel is a research and development company that focuses on parallel programming and
Performance Testing. Configuration Parameters for Performance Testing
Optimizing an ecommerce site for performance on a global scale requires additional oversight, budget, dedicated technical resources, local expertise, and specialized vendor solutions to ensure that international
PTask: Operating System Abstractions To Manage GPUs as Compute Devices
PTask: Operating System Abstractions To Manage GPUs as Compute Devices C.J. Rossbach, J. Currey - Microsoft Research B. Ray, E. Witchel - University of Texas M.Silberstein - Technion Presentation: Adam
VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D
ware and CPU Virtualization Technology Jack Lo Sr. Director, R&D This presentation may contain ware confidential information. Copyright 2005 ware, Inc. All rights reserved. All other marks and names mentioned
Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format
, pp.91-100 http://dx.doi.org/10.14257/ijhit.2014.7.4.09 Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format Jingjing Zheng 1,* and Ting Wang 1, 2 1,* Parallel Software and Computational
Multicore Programming with LabVIEW Technical Resource Guide
Multicore Programming with LabVIEW Technical Resource Guide 2 INTRODUCTORY TOPICS UNDERSTANDING PARALLEL HARDWARE: MULTIPROCESSORS, HYPERTHREADING, DUAL- CORE, MULTICORE AND FPGAS... 5 DIFFERENCES BETWEEN
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes) John Ousterhout Sun Microsystems Laboratories john.ousterhout@eng.sun.com http://www.sunlabs.com/~ouster Introduction Threads: Grew up in OS world (processes).
Tutorial: Getting Started
9 Tutorial: Getting Started INFRASTRUCTURE A MAKEFILE PLAIN HELLO WORLD APERIODIC HELLO WORLD PERIODIC HELLO WORLD WATCH THOSE REAL-TIME PRIORITIES THEY ARE SERIOUS SUMMARY Getting started with a new platform
Real-Time Scheduling 1 / 39
Real-Time Scheduling 1 / 39 Multiple Real-Time Processes A runs every 30 msec; each time it needs 10 msec of CPU time B runs 25 times/sec for 15 msec C runs 20 times/sec for 5 msec For our equation, A
THREADS WINDOWS XP SCHEDULING
Threads Windows, ManyCores, and Multiprocessors THREADS WINDOWS XP SCHEDULING (c) Peter Sturm, University of Trier, D-54286 Trier, Germany 1 Classification Two level scheduling 1 st level uses priorities
Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest
Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest 1. Introduction Few years ago, parallel computers could
Lesson Objectives. To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization
Lesson Objectives To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization AE3B33OSD Lesson 1 / Page 2 What is an Operating System? A
Course Development of Programming for General-Purpose Multicore Processors
Course Development of Programming for General-Purpose Multicore Processors Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Richmond, VA 23284 wzhang4@vcu.edu
A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin
A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86
10.04.2008. Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details
Thomas Fahrig Senior Developer Hypervisor Team Hypervisor Architecture Terminology Goals Basics Details Scheduling Interval External Interrupt Handling Reserves, Weights and Caps Context Switch Waiting
159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354
159.735 Final Report Cluster Scheduling Submitted by: Priti Lohani 04244354 1 Table of contents: 159.735... 1 Final Report... 1 Cluster Scheduling... 1 Table of contents:... 2 1. Introduction:... 3 1.1
4. The Android System
4. The Android System 4. The Android System System-on-Chip Emulator Overview of the Android System Stack Anatomy of an Android Application 73 / 303 4. The Android System Help Yourself Android Java Development
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Kjetil Matheussen Norwegian Center for Technology in Music and the Arts. (NOTAM) k.s.matheussen@notam02.no. Abstract. Keywords.
Implementing a Polyphonic MIDI Software Synthesizer using Coroutines, Realtime Garbage Collection, Closures, Auto-Allocated Variables, Dynamic Scoping, and Continuation Passing Style Programming Kjetil
2. An Operating System, What For?
2. An Operating System, What For? 2. An Operating System, What For? Operating System Tasks Survey of Operating System Principles 14 / 352 2. An Operating System, What For? Batch Processing Punched Cards
theguard! ApplicationManager System Windows Data Collector
theguard! ApplicationManager System Windows Data Collector Status: 10/9/2008 Introduction... 3 The Performance Features of the ApplicationManager Data Collector for Microsoft Windows Server... 3 Overview
Server Manager Performance Monitor. Server Manager Diagnostics Page. . Information. . Audit Success. . Audit Failure
Server Manager Diagnostics Page 653. Information. Audit Success. Audit Failure The view shows the total number of events in the last hour, 24 hours, 7 days, and the total. Each of these nodes can be expanded
Introduction to Java A First Look
Introduction to Java A First Look Java is a second or third generation object language Integrates many of best features Smalltalk C++ Like Smalltalk Everything is an object Interpreted or just in time
Parallel Firewalls on General-Purpose Graphics Processing Units
Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering
There are a number of factors that increase the risk of performance problems in complex computer and software systems, such as e-commerce systems.
ASSURING PERFORMANCE IN E-COMMERCE SYSTEMS Dr. John Murphy Abstract Performance Assurance is a methodology that, when applied during the design and development cycle, will greatly increase the chances
Client/Server Computing Distributed Processing, Client/Server, and Clusters
Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the
Virtual Machines. www.viplavkambli.com
1 Virtual Machines A virtual machine (VM) is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software
The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology
3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related
An Easier Way for Cross-Platform Data Acquisition Application Development
An Easier Way for Cross-Platform Data Acquisition Application Development For industrial automation and measurement system developers, software technology continues making rapid progress. Software engineers
Four Keys to Successful Multicore Optimization for Machine Vision. White Paper
Four Keys to Successful Multicore Optimization for Machine Vision White Paper Optimizing a machine vision application for multicore PCs can be a complex process with unpredictable results. Developers need
Page 1 of 5. IS 335: Information Technology in Business Lecture Outline Operating Systems
Lecture Outline Operating Systems Objectives Describe the functions and layers of an operating system List the resources allocated by the operating system and describe the allocation process Explain how
Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual
Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Overview Metrics Monitor is part of Intel Media Server Studio 2015 for Linux Server. Metrics Monitor is a user space shared library
Clonecloud: Elastic execution between mobile device and cloud [1]
Clonecloud: Elastic execution between mobile device and cloud [1] ACM, Intel, Berkeley, Princeton 2011 Cloud Systems Utility Computing Resources As A Service Distributed Internet VPN Reliable and Secure
A Look through the Android Stack
A Look through the Android Stack A Look through the Android Stack Free Electrons Maxime Ripard Free Electrons Embedded Linux Developers c Copyright 2004-2012, Free Electrons. Creative Commons BY-SA 3.0
Using Power to Improve C Programming Education
Using Power to Improve C Programming Education Jonas Skeppstedt Department of Computer Science Lund University Lund, Sweden jonas.skeppstedt@cs.lth.se jonasskeppstedt.net jonasskeppstedt.net jonas.skeppstedt@cs.lth.se
Operating System. Lecture Slides By Silberschatz, Galvin & Gagne (8 th Edition) Modified By: Prof. Mitul K. Patel
Operating System Lecture Slides By Silberschatz, Galvin & Gagne (8 th Edition) Modified By: Prof. Mitul K. Patel Shree Swami Atmanand Saraswati Institute of Technology, Surat January 2012 Outline 1 Chapter
Extreme High Performance Computing or Why Microkernels Suck
Extreme High Performance Computing or Why Microkernels Suck Christoph Lameter sgi clameter@sgi.com Abstract One often wonders how well Linux scales. We frequently get suggestions that Linux cannot scale
Design Issues in a Bare PC Web Server
Design Issues in a Bare PC Web Server Long He, Ramesh K. Karne, Alexander L. Wijesinha, Sandeep Girumala, and Gholam H. Khaksari Department of Computer & Information Sciences, Towson University, 78 York
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
Introduction to the NI Real-Time Hypervisor
Introduction to the NI Real-Time Hypervisor 1 Agenda 1) NI Real-Time Hypervisor overview 2) Basics of virtualization technology 3) Configuring and using Real-Time Hypervisor systems 4) Performance and
SYSTEM ecos Embedded Configurable Operating System
BELONGS TO THE CYGNUS SOLUTIONS founded about 1989 initiative connected with an idea of free software ( commercial support for the free software ). Recently merged with RedHat. CYGNUS was also the original
Operating Systems Concepts: Chapter 7: Scheduling Strategies
Operating Systems Concepts: Chapter 7: Scheduling Strategies Olav Beckmann Huxley 449 http://www.doc.ic.ac.uk/~ob3 Acknowledgements: There are lots. See end of Chapter 1. Home Page for the course: http://www.doc.ic.ac.uk/~ob3/teaching/operatingsystemsconcepts/
features at a glance
hp availability stats and performance software network and system monitoring for hp NonStop servers a product description from hp features at a glance Online monitoring of object status and performance
Chapter 18: Database System Architectures. Centralized Systems
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
Multilevel Load Balancing in NUMA Computers
FACULDADE DE INFORMÁTICA PUCRS - Brazil http://www.pucrs.br/inf/pos/ Multilevel Load Balancing in NUMA Computers M. Corrêa, R. Chanin, A. Sales, R. Scheer, A. Zorzo Technical Report Series Number 049 July,
What Is an RTOS and Why U se Use One? May, May 2013
What Is an RTOS and Why Use One? May, 2013 What is an Embedded System? Dedicated to a specific purpose Components: Microprocessor Application program Real-Time Operating System (RTOS) RTOS and application
Chapter 3: Operating-System Structures. System Components Operating System Services System Calls System Programs System Structure Virtual Machines
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines Operating System Concepts 3.1 Common System Components
Developing a dynamic, real-time IT infrastructure with Red Hat integrated virtualization
Developing a dynamic, real-time IT infrastructure with Red Hat integrated virtualization www.redhat.com Table of contents Introduction Page 3 Benefits of virtualization Page 3 Virtualization challenges
Multicore Systems Challenges for the Real-Time Software Developer
Multicore Systems Challenges for the Real-Time Software Developer Dr. Fridtjof Siebert aicas GmbH Haid-und-Neu-Str. 18 76131 Karlsruhe, Germany siebert@aicas.com Abstract Multicore systems have become
IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version 6.3.1 Fix Pack 2.
IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version 6.3.1 Fix Pack 2 Reference IBM Tivoli Composite Application Manager for Microsoft Applications:
Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
LSKA 2010 Survey Report Job Scheduler
LSKA 2010 Survey Report Job Scheduler Graduate Institute of Communication Engineering {r98942067, r98942112}@ntu.edu.tw March 31, 2010 1. Motivation Recently, the computing becomes much more complex. However,
Symmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called