Instruction scheduling
|
|
|
- Robert Andrews
- 10 years ago
- Views:
Transcription
1 Instruction ordering Instruction scheduling Advanced Compiler Construction Michel Schinz When a compiler emits the instructions corresponding to a program, it imposes a total order on them. However, that order is usually not the only valid one, in the sense that it can be changed without modifying the program s behavior. For example, if two instructions i 1 and i 2 appear sequentially in that order and are independent, then it is possible to swap them. 1 2 Instruction scheduling Pipeline stalls Among all the valid permutations of the instructions composing a program i.e. those that preserve the program s behavior some can be more desirable than others. For example, one order might lead to a faster program on some machine, because of architectural constraints. The aim of instruction scheduling is to find a valid order that optimizes some metric, like execution speed. Modern, pipelined architectures can usually issue at least one instruction per clock cycle. However, an instruction can be executed only if the data it needs is ready. Otherwise, the pipeline stalls for one or several cycles. Stalls can appear because some instructions, e.g. division, require several cycles to complete, or because data has to be fetched from memory. 3 4
2 Scheduling example Scheduling example The following example will illustrate how proper scheduling can reduce the time required to execute a piece of RTL code. We assume the following delays for instructions: Instruction kind RTL notation Delay Memory load or store Ra Mem[Rb+c] 3 Mem[R b+c] R a Multiplication R a R b * R c 2 Cycle Instruction 1 R 1 Mem[R SP] 4 R 1 R 1 + R 1 5 R 2 Mem[R SP+1] 8 R 1 R 1 * R 2 9 R 2 Mem[R SP+2] 12 R 1 R 1 * R 2 13 R 2 Mem[R SP+3] 16 R 1 R 1 * R 2 Cycle Instruction 1 R 1 Mem[R SP] 2 R 2 Mem[R SP+1] 3 R 3 Mem[R SP+2] 4 R 1 R 1 + R 1 5 R 1 R 1 * R 2 6 R 2 Mem[R SP+3] 7 R 1 R 1 * R 3 9 R 1 R 1 * R 2 Addition R a R b + R c 1 18 Mem[R SP+4] R 1 11 Mem[R SP+4] R 1 After scheduling (including renaming), the last instruction is issued at cycle 11 instead of 18! 5 6 Instruction dependences Data dependences An instruction i 2 depends on an instruction i 1 when it is not possible to execute i 2 before i 1 without changing the behavior of the program. The most common reason for dependence is datadependence: i 2 uses a value that is computed by i 1. However, as we will see, there are other kinds of dependences. We distinguish three kinds of dependences between two instructions i 1 and i 2 : 1. true dependence i 2 reads a value written by i 1 (read after write or RAW), 2. antidependence i 2 writes a value read by i 1 (write after read or WAR), 3. antidependence i 2 writes a value written by i 1 (write after write or WAW). 7 8
3 Antidependences Antidependences are not real dependences, in the sense that they do not arise from the flow of data. They are due to a single location being used to store different values. Most of the time, antidependences can be removed by renaming locations e.g. registers. In the example below, the program on the left contains a WAW antidependence between the two memory load instructions, that can be removed by renaming the second use of R 1. R 1 Mem[R SP] R 1 Mem[R SP] Computing dependences Identifying dependences among instructions that only access registers is easy. Instructions that access memory are harder to handle. In general, it is not possible to know whether two such instructions refer to the same memory location. Conservative approximations not examined here therefore have to be used. R 1 Mem[R SP+1] R 2 Mem[R SP+1] R 4 R 4 + R Dependence graph Dependence graph example Name Instruction a The dependence graph is a directed graph representing dependences among instructions. Its nodes are the instructions to schedule, and there is an edge from node n 1 to node n 2 iff the instruction of n 2 depends on n 1. Any topological sort of the nodes of this graph represents a valid way to schedule the instructions. a R 1 Mem[R SP] b R 1 R 1 + R 1 c R 2 Mem[R SP+1] d R 1 R 1 * R 2 e R 2 Mem[R SP+2] f R 1 R 1 * R 2 g R 2 Mem[R SP+3] h R 1 R 1 * R 2 i Mem[R SP+4] R 1 b d c f e h g i true dependence antidependence 11 12
4 Difficulty of scheduling List scheduling algorithm Optimal instruction scheduling is NP-complete. As always, this implies that we will use techniques based on heuristics to find a good but sometimes not optimal solution to that problem. List scheduling is a technique to schedule the instructions of a single basic block. Its basic idea is to simulate the execution of the instructions, and to try to schedule instructions only when all their operands can be used without stalling the pipeline. The list scheduling algorithm maintains two lists: ready is the list of instructions that could be scheduled without stall, ordered by priority, active is the list of instructions that are being executed. At each step, the highest-priority instruction from ready is scheduled, and moved to active, where it stays for a time equal to its delay. Before scheduling is performed, renaming is done to remove all antidependences that can be removed Prioritizing instructions Nodes (i.e. instructions) are sorted by priority in the ready list. Several schemes exist to compute the priority of a node, which can be equal to: the length of the longest latency-weighted path from it to a root of the dependence graph, the number of its immediate successors, the number of its descendants, its latency, etc. Unfortunately, no single scheme is better for all cases. b 10 d 9 List scheduling example a 13 Cycle ready active c 12 f 7 e 10 h 5 priority g 8 i 3 A node's priority is the length of the longest latencyweighted path from it to a root of the dependence graph 1 [a 13,c 12,e 10,g 8 ] [a] 2 [c 12,e 10,g 8 ] [a,c] 3 [e 10,g 8 ] [a,c,e] 4 [b 10,g 8 ] [b,c,e] 5 [d 9,g 8 ] [d,e] 6 [g 8 ] [d,g] 7 [f 7 ] [f,g] 8 [] [f,g] 9 [h 5 ] [h] 10 [] [h] 11 [i 3 ] [i] 12 [] [i] 13 [] [i] 14 [] [] 15 16
5 Scheduling conflicts It is hard to decide whether scheduling should be done before or after register allocation. If register allocation is done first, it can introduce antidependences when reusing registers. If scheduling is done first, register allocation can introduce spilling code, destroying the schedule. Solution: schedule first, then allocate registers and schedule once more if spilling was necessary. 17
Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers
CS 4 Introduction to Compilers ndrew Myers Cornell University dministration Prelim tomorrow evening No class Wednesday P due in days Optional reading: Muchnick 7 Lecture : Instruction scheduling pr 0 Modern
Software Pipelining by Modulo Scheduling. Philip Sweany University of North Texas
Software Pipelining by Modulo Scheduling Philip Sweany University of North Texas Overview Opportunities for Loop Optimization Software Pipelining Modulo Scheduling Resource and Dependence Constraints Scheduling
AN IMPLEMENTATION OF SWING MODULO SCHEDULING WITH EXTENSIONS FOR SUPERBLOCKS TANYA M. LATTNER
AN IMPLEMENTATION OF SWING MODULO SCHEDULING WITH EXTENSIONS FOR SUPERBLOCKS BY TANYA M. LATTNER B.S., University of Portland, 2000 THESIS Submitted in partial fulfillment of the requirements for the degree
Pipelining Review and Its Limitations
Pipelining Review and Its Limitations Yuri Baida [email protected] [email protected] October 16, 2010 Moscow Institute of Physics and Technology Agenda Review Instruction set architecture Basic
INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER
Course on: Advanced Computer Architectures INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Prof. Cristina Silvano Politecnico di Milano [email protected] Prof. Silvano, Politecnico di Milano
Computer Architecture TDTS10
why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers
Lecture: Pipelining Extensions. Topics: control hazards, multi-cycle instructions, pipelining equations
Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations 1 Problem 6 Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2
More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction
Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits
Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique
Using Power to Improve C Programming Education
Using Power to Improve C Programming Education Jonas Skeppstedt Department of Computer Science Lund University Lund, Sweden [email protected] jonasskeppstedt.net jonasskeppstedt.net [email protected]
PROBLEMS. which was discussed in Section 1.6.3.
22 CHAPTER 1 BASIC STRUCTURE OF COMPUTERS (Corrisponde al cap. 1 - Introduzione al calcolatore) PROBLEMS 1.1 List the steps needed to execute the machine instruction LOCA,R0 in terms of transfers between
Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters
Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.
PROBLEMS #20,R0,R1 #$3A,R2,R4
506 CHAPTER 8 PIPELINING (Corrisponde al cap. 11 - Introduzione al pipelining) PROBLEMS 8.1 Consider the following sequence of instructions Mul And #20,R0,R1 #3,R2,R3 #$3A,R2,R4 R0,R2,R5 In all instructions,
VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU
VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU Martin Straka Doctoral Degree Programme (1), FIT BUT E-mail: [email protected] Supervised by: Zdeněk Kotásek E-mail: [email protected]
Why? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:
Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):
A Lab Course on Computer Architecture
A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,
Towards Optimal Firewall Rule Ordering Utilizing Directed Acyclical Graphs
Towards Optimal Firewall Rule Ordering Utilizing Directed Acyclical Graphs Ashish Tapdiya and Errin W. Fulp Department of Computer Science Wake Forest University Winston Salem, NC, USA nsg.cs.wfu.edu Email:
2) Write in detail the issues in the design of code generator.
COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage
Optimizing Configuration and Application Mapping for MPSoC Architectures
Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : [email protected] 1 Multi-Processor Systems on Chip (MPSoC) Design Trends
Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit
Data Structures Page 1 of 24 A.1. Arrays (Vectors) n-element vector start address + ielementsize 0 +1 +2 +3 +4... +n-1 start address continuous memory block static, if size is known at compile time dynamic,
A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3
A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti, Nidhi Rajak 1 Department of Computer Science & Applications, Dr.H.S.Gour Central University, Sagar, India, [email protected]
CS521 CSE IITG 11/23/2012
CS521 CSE TG 11/23/2012 A Sahu 1 Degree of overlap Serial, Overlapped, d, Super pipelined/superscalar Depth Shallow, Deep Structure Linear, Non linear Scheduling of operations Static, Dynamic A Sahu slide
NP-complete? NP-hard? Some Foundations of Complexity. Prof. Sven Hartmann Clausthal University of Technology Department of Informatics
NP-complete? NP-hard? Some Foundations of Complexity Prof. Sven Hartmann Clausthal University of Technology Department of Informatics Tractability of Problems Some problems are undecidable: no computer
Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz 2007 03 16
Advanced compiler construction Michel Schinz 2007 03 16 General course information Teacher & assistant Course goals Teacher: Michel Schinz [email protected] Assistant: Iulian Dragos INR 321, 368 64
CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE
CHAPTER 5 71 FINITE STATE MACHINE FOR LOOKUP ENGINE 5.1 INTRODUCTION Finite State Machines (FSMs) are important components of digital systems. Therefore, techniques for area efficiency and fast implementation
Dynamic Network Resources Allocation in Grids through a Grid Network Resource Broker
INGRID 2007 Instrumenting the GRID Second International Workshop on Distributed Cooperative Laboratories Session 2: Networking for the GRID Dynamic Network Resources Allocation in Grids through a Grid
Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic
Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic The challenge When building distributed, large-scale applications, quality assurance (QA) gets increasingly
Integrated System Modeling for Handling Big Data in Electric Utility Systems
Integrated System Modeling for Handling Big Data in Electric Utility Systems Stephanie Hamilton Brookhaven National Laboratory Robert Broadwater EDD [email protected] 1 Finding Good Solutions for the Hard
root node level: internal node edge leaf node CS@VT Data Structures & Algorithms 2000-2009 McQuain
inary Trees 1 A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the root, which are disjoint from each
UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS
UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction
Software Pipelining - Modulo Scheduling
EECS 583 Class 12 Software Pipelining - Modulo Scheduling University of Michigan October 15, 2014 Announcements + Reading Material HW 2 Due this Thursday Today s class reading» Iterative Modulo Scheduling:
Definitions. Software Metrics. Why Measure Software? Example Metrics. Software Engineering. Determine quality of the current product or process
Definitions Software Metrics Software Engineering Measure - quantitative indication of extent, amount, dimension, capacity, or size of some attribute of a product or process. Number of errors Metric -
Throughput constraint for Synchronous Data Flow Graphs
Throughput constraint for Synchronous Data Flow Graphs *Alessio Bonfietti Michele Lombardi Michela Milano Luca Benini!"#$%&'()*+,-)./&0&20304(5 60,7&-8990,.+:&;/&."!?@A>&"'&=,0B+C. !"#$%&'()* Resource
Satisfiability Checking
Satisfiability Checking SAT-Solving Prof. Dr. Erika Ábrahám Theory of Hybrid Systems Informatik 2 WS 10/11 Prof. Dr. Erika Ábrahám - Satisfiability Checking 1 / 40 A basic SAT algorithm Assume the CNF
On Tackling Virtual Data Center Embedding Problem
On Tackling Virtual Data Center Embedding Problem Md Golam Rabbani, Rafael Esteves, Maxim Podlesny, Gwendal Simon Lisandro Zambenedetti Granville, Raouf Boutaba D.R. Cheriton School of Computer Science,
Scheduling Shop Scheduling. Tim Nieberg
Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations
Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li
Computer Algorithms NP-Complete Problems NP-completeness The quest for efficient algorithms is about finding clever ways to bypass the process of exhaustive search, using clues from the input in order
WAR: Write After Read
WAR: Write After Read write-after-read (WAR) = artificial (name) dependence add R1, R2, R3 sub R2, R4, R1 or R1, R6, R3 problem: add could use wrong value for R2 can t happen in vanilla pipeline (reads
EECS 583 Class 11 Instruction Scheduling Software Pipelining Intro
EECS 58 Class Instruction Scheduling Software Pipelining Intro University of Michigan October 8, 04 Announcements & Reading Material Reminder: HW Class project proposals» Signup sheet available next Weds
Discuss the size of the instance for the minimum spanning tree problem.
3.1 Algorithm complexity The algorithms A, B are given. The former has complexity O(n 2 ), the latter O(2 n ), where n is the size of the instance. Let n A 0 be the size of the largest instance that can
Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:
Pipelining HW Q. Can a MIPS SW instruction executing in a simple 5-stage pipelined implementation have a data dependency hazard of any type resulting in a nop bubble? If so, show an example; if not, prove
Single-Link Failure Detection in All-Optical Networks Using Monitoring Cycles and Paths
Single-Link Failure Detection in All-Optical Networks Using Monitoring Cycles and Paths Satyajeet S. Ahuja, Srinivasan Ramasubramanian, and Marwan Krunz Department of ECE, University of Arizona, Tucson,
EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution
EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution
Chapter 5 Instructor's Manual
The Essentials of Computer Organization and Architecture Linda Null and Julia Lobur Jones and Bartlett Publishers, 2003 Chapter 5 Instructor's Manual Chapter Objectives Chapter 5, A Closer Look at Instruction
Bandwidth Allocation in a Network Virtualization Environment
Bandwidth Allocation in a Network Virtualization Environment Juan Felipe Botero [email protected] Xavier Hesselbach [email protected] Department of Telematics Technical University of Catalonia
! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.
Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of
CSEE W4824 Computer Architecture Fall 2012
CSEE W4824 Computer Architecture Fall 2012 Lecture 2 Performance Metrics and Quantitative Principles of Computer Design Luca Carloni Department of Computer Science Columbia University in the City of New
FPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1
System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
GCE Computing. COMP3 Problem Solving, Programming, Operating Systems, Databases and Networking Report on the Examination.
GCE Computing COMP3 Problem Solving, Programming, Operating Systems, Databases and Networking Report on the Examination 2510 Summer 2014 Version: 1.0 Further copies of this Report are available from aqa.org.uk
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Dynamic Load Balancing framework in Charm++ Measurement Based Load Balancing Examples: Hybrid Load Balancers Topology-aware
Applied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
Concept of Cache in web proxies
Concept of Cache in web proxies Chan Kit Wai and Somasundaram Meiyappan 1. Introduction Caching is an effective performance enhancing technique that has been used in computer systems for decades. However,
Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis
Performance Metrics and Scalability Analysis 1 Performance Metrics and Scalability Analysis Lecture Outline Following Topics will be discussed Requirements in performance and cost Performance metrics Work
Solutions. Solution 4.1. 4.1.1 The values of the signals are as follows:
4 Solutions Solution 4.1 4.1.1 The values of the signals are as follows: RegWrite MemRead ALUMux MemWrite ALUOp RegMux Branch a. 1 0 0 (Reg) 0 Add 1 (ALU) 0 b. 1 1 1 (Imm) 0 Add 1 (Mem) 0 ALUMux is the
Introduction to Scheduling Theory
Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France [email protected] November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling
FPGA area allocation for parallel C applications
1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University
Topics of Chapter 5 Sequential Machines. Memory elements. Memory element terminology. Clock terminology
Topics of Chapter 5 Sequential Machines Memory elements Memory elements. Basics of sequential machines. Clocking issues. Two-phase clocking. Testing of combinational (Chapter 4) and sequential (Chapter
! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.
Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three
SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study
SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study Milan E. Soklic Abstract This article introduces a new load balancing algorithm, called diffusive load balancing, and compares its performance
How To Provide Qos Based Routing In The Internet
CHAPTER 2 QoS ROUTING AND ITS ROLE IN QOS PARADIGM 22 QoS ROUTING AND ITS ROLE IN QOS PARADIGM 2.1 INTRODUCTION As the main emphasis of the present research work is on achieving QoS in routing, hence this
Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)
Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,
Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)
Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems
CAD Algorithms. P and NP
CAD Algorithms The Classes P and NP Mohammad Tehranipoor ECE Department 6 September 2010 1 P and NP P and NP are two families of problems. P is a class which contains all of the problems we solve using
Architectures for massive data management
Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet [email protected] October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with
Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling
Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one
Probe Station Placement for Robust Monitoring of Networks
Probe Station Placement for Robust Monitoring of Networks Maitreya Natu Dept. of Computer and Information Science University of Delaware Newark, DE, USA, 97 Email: [email protected] Adarshpal S. Sethi
Data Structure [Question Bank]
Unit I (Analysis of Algorithms) 1. What are algorithms and how they are useful? 2. Describe the factor on best algorithms depends on? 3. Differentiate: Correct & Incorrect Algorithms? 4. Write short note:
Decentralized Utility-based Sensor Network Design
Decentralized Utility-based Sensor Network Design Narayanan Sadagopan and Bhaskar Krishnamachari University of Southern California, Los Angeles, CA 90089-0781, USA [email protected], [email protected]
Big Data looks Tiny from the Stratosphere
Volker Markl http://www.user.tu-berlin.de/marklv [email protected] Big Data looks Tiny from the Stratosphere Data and analyses are becoming increasingly complex! Size Freshness Format/Media Type
Preserving Message Integrity in Dynamic Process Migration
Preserving Message Integrity in Dynamic Process Migration E. Heymann, F. Tinetti, E. Luque Universidad Autónoma de Barcelona Departamento de Informática 8193 - Bellaterra, Barcelona, Spain e-mail: [email protected]
Traffic Engineering for Multiple Spanning Tree Protocol in Large Data Centers
Traffic Engineering for Multiple Spanning Tree Protocol in Large Data Centers Ho Trong Viet, Yves Deville, Olivier Bonaventure, Pierre François ICTEAM, Université catholique de Louvain (UCL), Belgium.
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek
Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture
OAMulator. Online One Address Machine emulator and OAMPL compiler. http://myspiders.biz.uiowa.edu/~fil/oam/
OAMulator Online One Address Machine emulator and OAMPL compiler http://myspiders.biz.uiowa.edu/~fil/oam/ OAMulator educational goals OAM emulator concepts Von Neumann architecture Registers, ALU, controller
Chapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!
Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel
John KW Hou, Ph.D. World- Wide Director, Industry Solutions Member of IBM Industry Academy IBM Corporation. José R. Favilla Jr
1 A Big Data Approach to Optimize Energy Efficiency in the Steel Industry John KW Hou, Ph.D. Manager, IBM Center of Competence, Metals IBM Corporation José R. Favilla Jr World- Wide Director, Industry
Parametric Attack Graph Construction and Analysis
Parametric Attack Graph Construction and Analysis Leanid Krautsevich Department of Computer Science, University of Pisa Largo Bruno Pontecorvo 3, Pisa 56127, Italy Istituto di Informatica e Telematica,
Topics. Flip-flop-based sequential machines. Signals in flip-flop system. Flip-flop rules. Latch-based machines. Two-sided latch constraint
Topics Flip-flop-based sequential machines! Clocking disciplines. Flip-flop rules! Primary inputs change after clock (φ) edge.! Primary inputs must stabilize before next clock edge.! Rules allow changes
Computer Organization and Components
Computer Organization and Components IS5, fall 25 Lecture : Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides
Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
Path Selection Analysis in MPLS Network Based on QoS
Cumhuriyet Üniversitesi Fen Fakültesi Fen Bilimleri Dergisi (CFD), Cilt:36, No: 6 Özel Sayı (2015) ISSN: 1300-1949 Cumhuriyet University Faculty of Science Science Journal (CSJ), Vol. 36, No: 6 Special
Why have SSA? Birth Points (cont) Birth Points (a notion due to Tarjan)
Why have SSA? Building SSA Form SSA-form Each name is defined exactly once, thus Each use refers to exactly one name x 17-4 What s hard? Straight-line code is trivial Splits in the CFG are trivial x a
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
A3 Computer Architecture
A3 Computer Architecture Engineering Science 3rd year A3 Lectures Prof David Murray [email protected] www.robots.ox.ac.uk/ dwm/courses/3co Michaelmas 2000 1 / 1 6. Stacks, Subroutines, and Memory
A number of tasks executing serially or in parallel. Distribute tasks on processors so that minimal execution time is achieved. Optimal distribution
Scheduling MIMD parallel program A number of tasks executing serially or in parallel Lecture : Load Balancing The scheduling problem NP-complete problem (in general) Distribute tasks on processors so that
Shared Backup Network Provision for Virtual Network Embedding
Shared Backup Network Provision for Virtual Network Embedding Tao Guo, Ning Wang, Klaus Moessner, and Rahim Tafazolli Centre for Communication Systems Research, University of Surrey Guildford, GU2 7XH,
