Experiences with Pattern-based Software Development



Similar documents
System Software and TinyAUTOSAR

Comparison of Service Call Implementations in an AUTOSAR Multi-core OS

Integrated Development of Distributed Real-Time Applications with Asynchronous Communication

aicas Technology Multi Core und Echtzeit Böse Überraschungen vermeiden Dr. Fridtjof Siebert CTO, aicas OOP 2011, 25 th January 2011

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Java Environment for Parallel Realtime Development Platform Independent Software Development for Multicore Systems

Course Development of Programming for General-Purpose Multicore Processors

Spring 2011 Prof. Hyesoon Kim

Implementing AUTOSAR Scheduling and Resource Management on an Embedded SMT Processor

Real Time Programming: Concepts

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

Stream Processing on GPUs Using Distributed Multimedia Middleware

Ada Real-Time Services and Virtualization

Multi-Threading Performance on Commodity Multi-Core Processors

Introduction to Cluster Computing

Improving System Scalability of OpenMP Applications Using Large Page Support

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri

Facing the Challenges for Real-Time Software Development on Multi-Cores

Implementing AUTOSAR Scheduling and Resource Management on an Embedded SMT Processor

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig

Multi-objective Design Space Exploration based on UML

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

BSPCloud: A Hybrid Programming Library for Cloud Computing *

Designing Predictable Multicore Architectures for Avionics and Automotive Systems extended abstract

CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015

Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format

Cost-Effective Certification of High- Assurance Cyber Physical Systems. Kurt Rohloff BBN Technologies

How To Write A Multi Threaded Software On A Single Core (Or Multi Threaded) System

Chapter 19: Real-Time Systems. Overview of Real-Time Systems. Objectives. System Characteristics. Features of Real-Time Systems

Apache Hama Design Document v0.6

Chapter 2 Addendum (More on Virtualization)

Embedded/Real-Time Software Development with PathMATE and IBM Rational Systems Developer

Incorporating Multicore Programming in Bachelor of Science in Computer Engineering Program

Using semantic properties for real time scheduling

Advanced Operating Systems (M) Dr Colin Perkins School of Computing Science University of Glasgow

Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems

Universität Karlsruhe (TH)

A SoC design flow based on UML 2.0 and SystemC

FPGA-based Multithreading for In-Memory Hash Joins

Mixed-Criticality: Integration of Different Models of Computation. University of Siegen, Roman Obermaisser

Predictable response times in event-driven real-time systems

Multi-core Programming System Overview

THREADS WINDOWS XP SCHEDULING

Attaining EDF Task Scheduling with O(1) Time Complexity

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

Resource Utilization of Middleware Components in Embedded Systems

Library Intro AC800M

Synchronization. Todd C. Mowry CS 740 November 24, Topics. Locks Barriers

PIE. Internal Structure

Designing Real-Time and Embedded Systems with the COMET/UML method

Control 2004, University of Bath, UK, September 2004

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Improving the performance of data servers on multicore architectures. Fabien Gaud

Antonio Kung, Trialog. HIJA technical coordinator. Scott Hansen, The Open Group. HIJA coordinator

STORM. Simulation TOol for Real-time Multiprocessor scheduling. Designer Guide V3.3.1 September 2009

Design Pattern for the Adaptive Scheduling of Real-Time Tasks with Multiple Versions in RTSJ

Cloud Computing and Robotics for Disaster Management

Threads Scheduling on Linux Operating Systems

Layered Queuing networks for simulating Enterprise Resource Planning systems

The Kiel Reactive Processor

Virtualization Technologies (ENCS 691K Chapter 3)

Four Keys to Successful Multicore Optimization for Machine Vision. White Paper

SOFT 437. Software Performance Analysis. Chapter 4: Software Execution Model

Parallel Algorithm Engineering

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

Power Management in Cloud Computing using Green Algorithm. -Kushal Mehta COP 6087 University of Central Florida

Parallel computing is the primary way that processor

Real-Time Monitoring Framework for Parallel Processes

Multicore Programming with LabVIEW Technical Resource Guide

Supporting Interactive Application Requirements in a Grid Environment

Semi-Automatic Generation of Monitoring Applications for Wireless Networks

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment

Advances in Smart Systems Research : ISSN : Vol. 3. No. 3 : pp.

A Comparison of Distributed Systems: ChorusOS and Amoeba

A Framework to Quantify the Overestimations of Static WCET Analysis. Hugues Cassé, Haluk Ozaktas, Christine Rochange

COS 318: Operating Systems. Virtual Machine Monitors

Gildart Haase School of Computer Sciences and Engineering

A Flexible Cluster Infrastructure for Systems Research and Software Development

Software and the Concurrency Revolution

evm Virtualization Platform for Windows

Low-Level Verification of Embedded Software: Addressing the Challenge

Low-Overhead Hard Real-time Aware Interconnect Network Router

MultiPARTES. Virtualization on Heterogeneous Multicore Platforms. 2012/7/18 Slides by TU Wien, UPV, fentiss, UPM

1.. This UI allows the performance of the business process, for instance, on an ecommerce system buy a book.

Attack graph analysis using parallel algorithm

Introduction to Cloud Computing

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Integrating real-time analysis into design flows

Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

How To Monitor Performance On A Microsoft Powerbook (Powerbook) On A Network (Powerbus) On An Uniden (Powergen) With A Microsatellite) On The Microsonde (Powerstation) On Your Computer (Power

PikeOS: Multi-Core RTOS for IMA. Dr. Sergey Tverdyshev SYSGO AG , Moscow

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

Java Virtual Machine: the key for accurated memory prefetching

Shared Address Space Computing: Programming

CSCI E 98: Managed Environments for the Execution of Programs

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification

A Case Study - Scaling Legacy Code on Next Generation Platforms

Transcription:

Experiences with Pattern-based Software Development Ralf Jahr University of Augsburg, Germany parmerasa Dissemination Event, Barcelona, 23/09/2014 23/09/2014 Experiences with Pattern-based Software Development 1

Outline Motivation Parallelization with Parallel Design Patterns (PDPs) Applied Concepts parmerasa Pattern Catalogue with PDPs and Analyzable Synchronization Idioms Pattern-supported Parallelization Approach Overview Model-based Optimization Timing-analyzable Algorithmic Skeletons (TAS) Measures for Effectiveness of Parallelization Summary & Conclusion 23/09/2014 Experiences with Pattern-based Software Development 2

Requirements for Static Timing Analysis of Parallel Software Main challenge: interferences of threads By hardware memory, NoC, e.g. By software for coordination and synchronization locks, barriers, e.g. The structured approach to parallelism proposes that commonly used patterns of computation and interaction should be abstracted as parameterisable library [...], Murray Cole Solution: Predictable hardware parmerasa processor architecture Predictable parallel software: Analyzable synchronization primitives: ticket lock, f&i-barrier Structured parallelism by parallel design patterns Coding guidelines for predictable sequential and parallel program parts 23/09/2014 Experiences with Pattern-based Software Development 3

Parallel Design Patterns Defined structures for computation and interaction Describe best-practice solutions Enforce structured parallelization 23/09/2014 Experiences with Pattern-based Software Development 4

parmerasa Pattern Catalogue Assessment of PDPs and SIs for determinism and timing-analyzability: PDPs only with defined SIs, defined synchronization points Static mapping of tasks/code to cores Little or no usage of (shared) dynamic data structures Description of PDPs and SIs with extended metapatterns in parmerasa Pattern Catalogue: Synchronization Idioms Real-time prerequisites WCET Hints [M. Gerdes, R. Jahr, and T. Ungerer. parmerasa pattern catalogue: Timing Predictable Parallel Design Patterns. Technical Report 2013-11, Department of Computer Science, University of Augsburg, Augsburg, Germany, 2013.] 23/09/2014 Experiences with Pattern-based Software Development 5

PDPs in Applications Application Task Parallelism Periodic Task Parallelism Data Parallel Parallel Pipeline HON: 3DPP X X HON: Stereo-Nav X X BMA: Crawler Crane X X DNDE: Diesel EMS X X Synchronization Idioms: ticket lock and f&i-barrier Fairness Analyzability 23/09/2014 Experiences with Pattern-based Software Development 6

How to get Parallelism with PDPs? Parallelization of existing sequential single-core code: Pattern-supported parallelization approach Experience shows: simple PDPs are enough Starting from scratch: Reveal parallelism in whole software development process PDPs might be from higher levels Modelling by Activity and Pattern Diagram (APD): Extended UML2 Activity Diagram with PDP nodes 23/09/2014 Experiences with Pattern-based Software Development 7

Parallelization Approach Starting point: Single-core program ( legacy code ) Pattern Catalogue with predictable PDPs and Sis Phase 1: Targeting High Degree of Parallelism Create model to reveal parallelism: Activity and Pattern Diagram (APD) Model consisting of sequential parts and PDPs Platform independent Phase 2: Targeting Optimal Parallelism Agglomeration of nodes, definition of parameters Platform dependent [R. Jahr, M. Gerdes, and T. Ungerer. A pattern-supported parallelization approach. In Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 13, Shenzhen, China, 2013. ACM.] 23/09/2014 Experiences with Pattern-based Software Development 8

Activity and Pattern Diagram (APD) Parallel Pipeline with three stages Pins for reading/writing shared data Sequential activities Parallel Design Pattern From HON 3DPP Derived from UML2 Activity Diagram: PDP as additional node type Fork/join operator not allowed -> structured parallelism only Activities/PDPs can encapsulate a single/multiple APD(s) 23/09/2014 Experiences with Pattern-based Software Development 9

Impact on Timing Analysis Precondition: Analyzable sequential single-core code Only defined set assessed Parallel Design Patterns (PDPs) and synchronization idioms (SIs): Static mapping of tasks to threads and cores Only defined synchronization idioms: f&i-barrier, ticket locks Known synchronization points Limited usage of (shared) dynamic data structures No indeterminism by race conditions Conclusion: Parallel software should also be analyzable But: Implementation of PDPs not fix, analyzability only defined for source Timing Analyzable Algorithmic Skeletons (TAS) 23/09/2014 Experiences with Pattern-based Software Development 10

Effects of PDPs on WCET Analysis Case-study: HON 3DPP PDPs out of parmerasa Pattern Catalogue enforce use of analyzable SIs only Comparison of existing parallel implementation with implementation based on PDPs Smaller WCET preparation effort with PDPs Comparable ACET performance of both implementations Better WCET performance of implementation based on PDPs [R. Jahr, M. Gerdes, T. Ungerer, H. Ozaktas, C. Rochange, and P. G. Zaykov. Effects of structured parallelism by parallel design patterns on embedded hard real-time systems. In 20th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Chongqing, China, 2014.] 23/09/2014 Experiences with Pattern-based Software Development 11

Assistance for Parallelization 3 1 2 23/09/2014 Experiences with Pattern-based Software Development 12

Model-based Optimization Idea: Find best assignment of threads to PDP instances Performance model for parmerasa Objectives: Minimize number of threads, Minimize WCET estimation, minimize accesses to/number of shared variables Implementation: Optimization with Java and SMPSO 1 Input: XML-Format briefly describing APDs and list of accesses to variables Case study: BMA crawler crane control code [R. Jahr, M. Frieb, M. Gerdes, and T. Ungerer. Model-based Parallelization and Optimization of an Industrial Control Code. In Dagstuhl-Workshop MBEES: Modellbasierte Entwicklung eingebetteter Systeme X, 2014] 23/09/2014 Experiences with Pattern-based Software Development 13

Timing-analyzable Algorithmic Skeletons (TAS) 2 Implementation of PDPs as library (pthreads and parmerasa): Data Parallel, Task Parallelism, Pipeline Parameterized implementation Effects: Less effort for parallelization Known implementation of PDPs improves WCET Strong positive effects on software maintainability! 23/09/2014 Experiences with Pattern-based Software Development 14

Support for Timing Analysis in TAS: Support for OTAWA Observations: Parallel code completely hidden in skeletons WCET analysis: IDs mainly necessary in TAS code, exception: function call for skeleton execution Simplification for OTAWA: Compact TAS XML format describing TAS instances Translator to OTAWA XML 23/09/2014 Experiences with Pattern-based Software Development 15

Effectiveness of Parallelization 3 Embedded systems must guarantee defined deadlines, even on multi-core ASAP is not goal! Instead: do more or with better results, consume less power, have more regular update intervals Suggested measures: Slack time: potential to compute more Speedup! Jitter in Update intervals Possible reduction of CPU clock rate 23/09/2014 Experiences with Pattern-based Software Development 16

Open Source Software https://github.com/parmerasa-uau/ Timing Analyzable Algorithmic Skeletons (TAS) Predictable PDPs implemented in C with pthreads Support for timing analysis with OTAWA Model-based Optimization Find best assignment of threads to PDP/TAS instances Multi-objective optimization with SMPSO Further information: [R. Jahr, A. Stegmeier, R. Kiefhaber, M. Frieb, and T. Ungerer. User Manual for the Optimization and WCET Analysis of Software with Timing Analyzable Algorithmic Skeletons. Technical Report 2014-05, Department of Computer Science, University of Augsburg, Augsburg, Germany, 2014.] 23/09/2014 Experiences with Pattern-based Software Development 17

Summary Structured Parallelism by Parallel Design Patterns (PDPs) and analyzable Synchronization Idioms (SIs) Predictable PDPs and SIs in Pattern Catalogue Parallelization approach with UML modelling and PDPs Support of Parallelization by skeletons and tools Such structured parallelism leads to timing-analyzable parallel code, less effort for WCET analysis and better WCET estimates 23/09/2014 Experiences with Pattern-based Software Development 18

Conclusion Path from sequential to parallel software covered well Parallelization is feasible also with support of WCET analysis Tool-support is important (analysis, optimization, implementation) 23/09/2014 Experiences with Pattern-based Software Development 19