Software and the Concurrency Revolution
|
|
|
- Lucinda Wells
- 10 years ago
- Views:
Transcription
1 Software and the Concurrency Revolution A: The world s fastest supercomputer, with up to 4 processors, 128MB RAM, 942 MFLOPS (peak). 2 Q: What is a 1984 Cray X-MP? (Or a fractional 2005 vintage Xbox ) 2007 Page: 1
2 Truths Consequences Futures The Last Slide First Although driven by hardware changes, the concurrency revolution is primarily a software revolution. Parallel hardware is not more of the same. It s a fundamentally different mainstream hardware architecture, even if the instruction set looks the same. (But using the same ISA does give us a great compatibility/migration story.) Software requires the most changes to regain the free lunch. The concurrency sea change impacts the entire software stack: Tools, languages, libraries, runtimes, operating systems. No programming language can ignore it and remain relevant. (Side benefit: Responsiveness, the other reason to want async code.) Software is also the gating factor. We will now do for concurrency what we did for OO and GUIs. Beyond the gate, hardware concurrency is coming more and sooner than most people yet believe Page: 2
3 Each Year We Get Faster More Processors Intel CPU Trends (sources: Intel, Wikipedia, K. Olukotun) 386 Pentium Montecito Moore s Law Historically: Boost singlestream performance via more complex chips. Now: Deliver more cores per chip (+ GPU, NIC, SoC). The free lunch is over for today s sequential apps and many concurrent apps. We need killer apps with lots of latent parallelism. A generational advance >OO is necessary to get above the threads+locks programming model. Each Year We Get Faster More Processors 1.7Bt 4.5Mt = ~100 P55Cs + 16 MB L3$ Montecito Intel CPU Trends (sources: Intel, Wikipedia, K. Olukotun) 386 Pentium Moore s Law Historically: Boost singlestream performance via more complex chips. Now: Deliver more cores per chip (+ GPU, NIC, SoC). The free lunch is over for today s sequential apps and many concurrent apps. We need killer apps with lots of latent parallelism. A generational advance >OO is necessary to get above the threads+locks programming model Page: 3
4 Educational State of the Union: May 2007 We have achieved general awareness. Most people know that the free lunch is over for sequential applications, and that the future is multicore and manycore. But few people really yet believe the magnitude, speed, and gating factor of the change: Magnitude: Comparable to the GUI revolution plus moving to a new hardware platform, simultaneously. Enabling manycore affects the entire software stack, from tools to languages to frameworks/libraries to runtimes. Speed: (Recall: Intel could build 100-Pentium chips today if they wanted to.) 100-way HW concurrency could be available in commodity desktops as soon as the year Gating factor: manycore-exploiting software is available. Truths Consequences Futures 2007 Page: 4
5 The Issue Is (Mostly) On the Client Solved : Server apps (e.g., DB servers, web services). Lots of independent requests one thread per request is easy. Typical to execute many copies of the same code. Shared data usually via structured databases: Automatic implicit concurrency control via transactions. With some care, concurrency problem is already solved here. Not solved: Typical client apps (i.e., not Photoshop). Somehow employ many threads per user request. Highly atypical to execute many copies of the same code. Shared data in unstructured shared memory: Error prone explicit locking where are the transactions? A Tale of Six Software Waves Each was born during , bided its time until 1990s/00s, then took 5+ years to build a mature tool/language/framework/runtime ecosystem. GUIs Objects GC Generics Net Concurrency Page: 5
6 Comparative Impacts Across the Stack Application programming model Libraries and frameworks Languages and compilers Runtimes and OS Tools (design, measure, test) GUIs Objects GC Generics Web Concurrency Extent of impact and/or enabling importance = Some, could drive one major product release = Major, could drive more than one major product release = New way of thinking / essential enabler, or multiple major releases O(1), O(K), or O(N) Concurrency? 1. Sequential apps. The free lunch is over (if CPU-bound): Flat or merely incremental perf. improvements. Potentially poor responsiveness. 2. Explicitly threaded apps. Hardwired # of threads that prefer K CPUs (for a given input workload). Can penalize <K CPUs, doesn t scale >K CPUs Scalable concurrent apps. Workload decomposed into a sea of heterogeneous chores. Lots of latent concurrency we can map down to N cores Page: 6
7 An OO for Concurrency OO Fortran, C, asm threads + locks semaphores Three Pillars of the Dawn A Framework for Evaluating Concurrency Summary Examples Asynchronous Agents Tasks that run independently and communicate via messages GUIs, background printing, disk/net access Concurrent Collections Operations on groups of things; exploit parallelism in data and algorithm structures Trees, quicksort, compilation Key metrics Responsiveness Throughput, manycore scalability Mutable Shared State Avoid races by synchronizing mutable objects in shared memory Locked data (99%), lock-free libraries (written by wizards) Race-free, deadlock-free Requirements Isolation, messaging Low overhead Composability Today s abstractions Possible new abstractions Threads, message queues Active objects, futures Thread pools, OpenMP Chores, futures, parallel STL, PLINQ Locks Transactional memory, declarative support for locks 2007 Page: 7
8 15 The Trouble With Locks How may I harm thee? Let me count the ways: { Lock lock1( muttable1 ); Lock lock2( muttable2 ); table1.erase( x ); table2.insert( x ); } Locks are the best we have, and known to be inadequate: Which lock? The connection between a lock and the data it protects exists primarily in the mind of a programmer. Deadlock? If the mutexes aren t acquired in the same order. Simplicity or scalability? Coarse-grained locks are simpler to use correctly, but easily become bottlenecks. Lost wake-ups? Blocking typically uses condition variables, and it s easy to forget to signal the correct ones. (Example coming up.) Not composable. In today s world, this is a deadly sin Page: 8
9 Enter Transactional Memory A transactional programming model: atomic { table1.erase( x ); table2.insert( x ); } Idea: Version memory like a database. Automatic concurrency control, rollback and retry for competing transactions. Benefits: No need to remember which lock to take. No deadlock: No need to remember a sync order discipline. Both fine-grained and scalable without sacrificing correctness. No wake-up calls as atomic blocks are automatically re-run. Best of all: Composable. Can once again build a big (correct) program out of smaller (correct) programs. Drawbacks: Still active research. And the elephant in the room is I/O (more later). Enter Transactional Memory (2) What if we want to block if x isn t in table1 yet? Today, we d typically use a wake-up: Wait on a condition variable, and have every thread inserting into table1 remember to signal the right cv. Tedious, error-prone, etc. Instead: atomic { if( table1.find(x) == table1.end() ) retry; table1.erase( x ); table2.insert( x ); } retry restarts the current atomic block from the beginning. Blocking is entirely modular: The call to retry might be deeply buried inside the implementation of debit, or of credit, or both. Can use transaction log to defer re-running until at least one memory location read by the transaction has changed Page: 9
10 A 4-Step Program For the Lock Addiction Greatly reduce locks. (Alas, not eliminate. ) 1. Enable transactional programming: Transactional memory is our best hope. Composable atomic { } blocks. Naturally enables speculative execution. (The elephant: Allowing I/O. The Achilles heel: Some resources are not transactable.) 2. Abstractions to reduce shared : Messages. Futures. Private data (e.g., active objects). 3. Techniques to reduce mutable : Immutable objects. Internally versioned objects. 4. Some locks will remain. Let the programmer declare: Which shared objects are protected by which locks. Lock hierarchies (caveat: also not composable). An OO for Concurrency active objects, chores, and futures parallel algorithms OO atomic { } locks: declare intent Fortran, C, asm threads + locks semaphores 2007 Page: 10
11 Truths Consequences Futures A Good Question: How Many Cores Are You Coding For? OoO - cores Page: 11
12 the truth is somewhere in here Future Many- Core Systems Complex core (OoO, prediction, pipelining, speculation) Simple core (InO) All large cores All small cores Mixed large & small cores (legacy code & Amdahl s Law) A Better Question: How Many Hardware Threads Are You Coding For? InO - threads InO - cores OoO - cores Page: 12
13 The First Slide Last Although driven by hardware changes, the concurrency revolution is primarily a software revolution. Parallel hardware is not more of the same. It s a fundamentally different mainstream hardware architecture, even if the instruction set looks the same. (But using the same ISA does give us a great compatibility/migration story.) Software requires the most changes to regain the free lunch. The concurrency sea change impacts the entire software stack: Tools, languages, libraries, runtimes, operating systems. No programming language can ignore it and remain relevant. (Side benefit: Responsiveness, the other reason to want async code.) Software is also the gating factor. We will now do for concurrency what we did for OO and GUIs. Beyond the gate, hardware concurrency is coming more and sooner than most people yet believe. For More Information The Free Lunch Is Over (Dr. Dobb s Journal, March 2005) The article that first used the terms the free lunch is over and concurrency revolution to describe the sea change. Software and the (with Jim Larus; ACM Queue, September 2005) showpage&pid=332 Why locks, functional languages, and other silver bullets aren t the answer, and observations on what we need for a great leap forward in languages and also in tools Page: 13
Multi-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
Processor Architectures
ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Exam 3 Tuesday, December 6 th Caches Virtual Memory Input / Output OperaKng Systems Compilers & Assemblers Processor Architecture
Spring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
Control 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes) John Ousterhout Sun Microsystems Laboratories [email protected] http://www.sunlabs.com/~ouster Introduction Threads: Grew up in OS world (processes).
SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY. 27 th Symposium on Parallel Architectures and Algorithms
27 th Symposium on Parallel Architectures and Algorithms SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY Nuno Diegues, Paolo Romano and Stoyan Garbatov Seer: Scheduling for Commodity
Parallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture References Anatomy of a database system. J. Hellerstein and M. Stonebraker. In Red Book (4th
Processes and Non-Preemptive Scheduling. Otto J. Anshus
Processes and Non-Preemptive Scheduling Otto J. Anshus 1 Concurrency and Process Challenge: Physical reality is Concurrent Smart to do concurrent software instead of sequential? At least we want to have
Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
Understanding Hardware Transactional Memory
Understanding Hardware Transactional Memory Gil Tene, CTO & co-founder, Azul Systems @giltene 2015 Azul Systems, Inc. Agenda Brief introduction What is Hardware Transactional Memory (HTM)? Cache coherence
Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:
Tools Page 1 of 13 ON PROGRAM TRANSLATION A priori, we have two translation mechanisms available: Interpretation Compilation On interpretation: Statements are translated one at a time and executed immediately.
Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak
Cray Gemini Interconnect Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Outline 1. Introduction 2. Overview 3. Architecture 4. Gemini Blocks 5. FMA & BTA 6. Fault tolerance
A Deduplication File System & Course Review
A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror
Performance evaluation
Performance evaluation Arquitecturas Avanzadas de Computadores - 2547021 Departamento de Ingeniería Electrónica y de Telecomunicaciones Facultad de Ingeniería 2015-1 Bibliography and evaluation Bibliography
News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
An Introduction to Parallel Computing/ Programming
An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European
Scalability evaluation of barrier algorithms for OpenMP
Scalability evaluation of barrier algorithms for OpenMP Ramachandra Nanjegowda, Oscar Hernandez, Barbara Chapman and Haoqiang H. Jin High Performance Computing and Tools Group (HPCTools) Computer Science
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM
A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin
A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86
Parallelism and Cloud Computing
Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication
This Unit: Multithreading (MT) CIS 501 Computer Architecture. Performance And Utilization. Readings
This Unit: Multithreading (MT) CIS 501 Computer Architecture Unit 10: Hardware Multithreading Application OS Compiler Firmware CU I/O Memory Digital Circuits Gates & Transistors Why multithreading (MT)?
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Disk Storage Shortfall
Understanding the root cause of the I/O bottleneck November 2010 2 Introduction Many data centers have performance bottlenecks that impact application performance and service delivery to users. These bottlenecks
The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.
White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
Petascale Software Challenges. Piyush Chaudhary [email protected] High Performance Computing
Petascale Software Challenges Piyush Chaudhary [email protected] High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons
Chapter 6, The Operating System Machine Level
Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General
Scala Actors Library. Robert Hilbrich
Scala Actors Library Robert Hilbrich Foreword and Disclaimer I am not going to teach you Scala. However, I want to: Introduce a library Explain what I use it for My Goal is to: Give you a basic idea about
Facing the Challenges for Real-Time Software Development on Multi-Cores
Facing the Challenges for Real-Time Software Development on Multi-Cores Dr. Fridtjof Siebert aicas GmbH Haid-und-Neu-Str. 18 76131 Karlsruhe, Germany [email protected] Abstract Multicore systems introduce
Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
How To Write A Multi Threaded Software On A Single Core (Or Multi Threaded) System
Multicore Systems Challenges for the Real-Time Software Developer Dr. Fridtjof Siebert aicas GmbH Haid-und-Neu-Str. 18 76131 Karlsruhe, Germany [email protected] Abstract Multicore systems have become
Embedded Systems: map to FPGA, GPU, CPU?
Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven [email protected] Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware
Application Performance Analysis Tools and Techniques
Mitglied der Helmholtz-Gemeinschaft Application Performance Analysis Tools and Techniques 2012-06-27 Christian Rössel Jülich Supercomputing Centre [email protected] EU-US HPC Summer School Dublin
Course Development of Programming for General-Purpose Multicore Processors
Course Development of Programming for General-Purpose Multicore Processors Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Richmond, VA 23284 [email protected]
Java Performance. Adrian Dozsa TM-JUG 18.09.2014
Java Performance Adrian Dozsa TM-JUG 18.09.2014 Agenda Requirements Performance Testing Micro-benchmarks Concurrency GC Tools Why is performance important? We hate slow web pages/apps We hate timeouts
Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?
Lecture 3: Evaluating Computer Architectures Announcements - Reminder: Homework 1 due Thursday 2/2 Last Time technology back ground Computer elements Circuits and timing Virtuous cycle of the past and
Next Generation Operating Systems
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the
Lecture 1 Introduction to Parallel Programming
Lecture 1 Introduction to Parallel Programming EN 600.320/420 Instructor: Randal Burns 4 September 2008 Department of Computer Science, Johns Hopkins University Pipelined Processor From http://arstechnica.com/articles/paedia/cpu/pipelining-2.ars
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of
COM 444 Cloud Computing
COM 444 Cloud Computing Lec 3: Virtual Machines and Virtualization of Clusters and Datacenters Prof. Dr. Halûk Gümüşkaya [email protected] [email protected] http://www.gumuskaya.com Virtual
MS SQL Performance (Tuning) Best Practices:
MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware
Chapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
There are a number of factors that increase the risk of performance problems in complex computer and software systems, such as e-commerce systems.
ASSURING PERFORMANCE IN E-COMMERCE SYSTEMS Dr. John Murphy Abstract Performance Assurance is a methodology that, when applied during the design and development cycle, will greatly increase the chances
9/26/2011. What is Virtualization? What are the different types of virtualization.
CSE 501 Monday, September 26, 2011 Kevin Cleary [email protected] What is Virtualization? What are the different types of virtualization. Practical Uses Popular virtualization products Demo Question,
Performance metrics for parallel systems
Performance metrics for parallel systems S.S. Kadam C-DAC, Pune [email protected] C-DAC/SECG/2006 1 Purpose To determine best parallel algorithm Evaluate hardware platforms Examine the benefits from parallelism
Design Issues in a Bare PC Web Server
Design Issues in a Bare PC Web Server Long He, Ramesh K. Karne, Alexander L. Wijesinha, Sandeep Girumala, and Gholam H. Khaksari Department of Computer & Information Sciences, Towson University, 78 York
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
Multicore Programming with LabVIEW Technical Resource Guide
Multicore Programming with LabVIEW Technical Resource Guide 2 INTRODUCTORY TOPICS UNDERSTANDING PARALLEL HARDWARE: MULTIPROCESSORS, HYPERTHREADING, DUAL- CORE, MULTICORE AND FPGAS... 5 DIFFERENCES BETWEEN
End-user Tools for Application Performance Analysis Using Hardware Counters
1 End-user Tools for Application Performance Analysis Using Hardware Counters K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour, T. Spencer Abstract One purpose of the end-user tools described in
So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell
So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell R&D Manager, Scalable System So#ware Department Sandia National Laboratories is a multi-program laboratory managed and
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis
Performance Metrics and Scalability Analysis 1 Performance Metrics and Scalability Analysis Lecture Outline Following Topics will be discussed Requirements in performance and cost Performance metrics Work
Optimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
Boost SQL Server Performance Buffer Pool Extensions & Delayed Durability
Boost SQL Server Performance Buffer Pool Extensions & Delayed Durability Manohar Punna President - SQLServerGeeks #509 Brisbane 2016 Agenda SQL Server Memory Buffer Pool Extensions Delayed Durability Analysis
Eloquence Training What s new in Eloquence B.08.00
Eloquence Training What s new in Eloquence B.08.00 2010 Marxmeier Software AG Rev:100727 Overview Released December 2008 Supported until November 2013 Supports 32-bit and 64-bit platforms HP-UX Itanium
Virtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab
Performance monitoring at CERN openlab July 20 th 2012 Andrzej Nowak, CERN openlab Data flow Reconstruction Selection and reconstruction Online triggering and filtering in detectors Raw Data (100%) Event
Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems
Fastboot Techniques for x86 Architectures Marcus Bortel Field Application Engineer QNX Software Systems Agenda Introduction BIOS and BIOS boot time Fastboot versus BIOS? Fastboot time Customizing the boot
Performance of the JMA NWP models on the PC cluster TSUBAME.
Performance of the JMA NWP models on the PC cluster TSUBAME. K.Takenouchi 1), S.Yokoi 1), T.Hara 1) *, T.Aoki 2), C.Muroi 1), K.Aranami 1), K.Iwamura 1), Y.Aikawa 1) 1) Japan Meteorological Agency (JMA)
Informatica Ultra Messaging SMX Shared-Memory Transport
White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
Symmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
SYSTEM ecos Embedded Configurable Operating System
BELONGS TO THE CYGNUS SOLUTIONS founded about 1989 initiative connected with an idea of free software ( commercial support for the free software ). Recently merged with RedHat. CYGNUS was also the original
Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database
WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
Scaling in a Hypervisor Environment
Scaling in a Hypervisor Environment Richard McDougall Chief Performance Architect VMware VMware ESX Hypervisor Architecture Guest Monitor Guest TCP/IP Monitor (BT, HW, PV) File System CPU is controlled
Effective Java Programming. efficient software development
Effective Java Programming efficient software development Structure efficient software development what is efficiency? development process profiling during development what determines the performance of
Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com
CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com Modern GPU
Bigdata High Availability (HA) Architecture
Bigdata High Availability (HA) Architecture Introduction This whitepaper describes an HA architecture based on a shared nothing design. Each node uses commodity hardware and has its own local resources
International Journal of Advancements in Research & Technology, Volume 1, Issue6, November-2012 1 ISSN 2278-7763
International Journal of Advancements in Research & Technology, Volume 1, Issue6, November-2012 1 VIRTUALIZATION Vikas Garg Abstract: The main aim of the research was to get the knowledge of present trends
Beyond Virtualization: A Novel Software Architecture for Multi-Core SoCs. Jim Ready September 18, 2012
Beyond Virtualization: A Novel Software Architecture for Multi-Core SoCs Jim Ready September 18, 2012 How HW guys view the world SW Software HW How SW guys view the world SW HW Reality The SoC Software
Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010
Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010 This document is provided as-is. Information and views expressed in this document, including URL and other Internet
Turbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
Intel DPDK Boosts Server Appliance Performance White Paper
Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks
Data Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
Maximizing SQL Server Virtualization Performance
Maximizing SQL Server Virtualization Performance Michael Otey Senior Technical Director Windows IT Pro SQL Server Pro 1 What this presentation covers Host configuration guidelines CPU, RAM, networking
WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE
WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE 1 W W W. F U S I ON I O.COM Table of Contents Table of Contents... 2 Executive Summary... 3 Introduction: In-Memory Meets iomemory... 4 What
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected]
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected] Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
Parallel Computing. Parallel shared memory computing with OpenMP
Parallel Computing Parallel shared memory computing with OpenMP Thorsten Grahs, 14.07.2014 Table of contents Introduction Directives Scope of data Synchronization OpenMP vs. MPI OpenMP & MPI 14.07.2014
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
