Driving force. What future software needs. Potential research topics



Similar documents
Resource Utilization of Middleware Components in Embedded Systems

Multi-core Programming System Overview

CHAPTER 1 INTRODUCTION

Xeon+FPGA Platform for the Data Center

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

CloudCmp:Comparing Cloud Providers. Raja Abhinay Moparthi

Performance evaluation

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

10 Gbps Line Speed Programmable Hardware for Open Source Network Applications*

Real-Time Operating Systems for MPSoCs

SOC architecture and design

Base One's Rich Client Architecture

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Java Environment for Parallel Realtime Development Platform Independent Software Development for Multicore Systems

System Models for Distributed and Cloud Computing

evm Virtualization Platform for Windows

Multi-Threading Performance on Commodity Multi-Core Processors

SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems

Parallels Virtuozzo Containers

BPM, EDA and SOA: How the Combination of these Technologies Facilitates Change. Dr. Neil Thomson, Head of Group Development, Microgen plc

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

Contents. Chapter 1. Introduction

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications.

Mobile Devices - An Introduction to the Android Operating Environment. Design, Architecture, and Performance Implications

Virtual Machine Learning: Thinking Like a Computer Architect

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Systems Software. Introduction to Information System Components. Chapter 1 Part 2 of 4 CA M S Mehta, FCA

Cluster, Grid, Cloud Concepts

Managing Data Center Power and Cooling

Architectures for Big Data Analytics A database perspective

Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems

Operating Systems 4 th Class

Chapter 3 Operating-System Structures

COM 444 Cloud Computing

Shared Parallel File System

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

Republic Polytechnic School of Information and Communications Technology C226 Operating System Concepts. Module Curriculum

OpenMosix Presented by Dr. Moshe Bar and MAASK [01]

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

The Comeback of Batch Tuning

Kernel Types System Calls. Operating Systems. Autumn 2013 CS4023

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

Bringing Big Data Modelling into the Hands of Domain Experts

TPCalc : a throughput calculator for computer architecture studies

Introduction to Cloud Computing

Four Keys to Successful Multicore Optimization for Machine Vision. White Paper

Stream Processing on GPUs Using Distributed Multimedia Middleware

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

Step by Step Guide To vstorage Backup Server (Proxy) Sizing

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Java in Education. Choosing appropriate tool for creating multimedia is the first step in multimedia design

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Virtual Machines.

ICRI-CI Retreat Architecture track

Introduction to Virtual Machines

on an system with an infinite number of processors. Calculate the speedup of

The Virtualization Practice

Zend and IBM: Bringing the power of PHP applications to the enterprise

Multi-GPU Load Balancing for Simulation and Rendering

Parallel Scalable Algorithms- Performance Parameters

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

LinuxWorld Conference & Expo Server Farms and XML Web Services

Characteristics of Java (Optional) Y. Daniel Liang Supplement for Introduction to Java Programming

Informatica Ultra Messaging SMX Shared-Memory Transport

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Why is CICS Still Alive? Dr Geoff Sharman Visiting Professor in Computer Science Birkbeck College

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

The Reduced Address Space (RAS) for Application Memory Authentication

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

ORACLE INSTANCE ARCHITECTURE

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data

Chapter 2 Addendum (More on Virtualization)

ioscale: The Holy Grail for Hyperscale

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

Getting Things Done: Practical Web/e-Commerce Application Stress Testing

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

Addressing the SAP Data Migration Challenges with SAP Netweaver XI

Using In-Memory Computing to Simplify Big Data Analytics

I Control Your Code Attack Vectors Through the Eyes of Software-based Fault Isolation. Mathias Payer, ETH Zurich

Enabling Technologies for Distributed and Cloud Computing

Transcription:

Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in # of transistors is not followed by comparable reduction in power Natural limit on parallelizability of computation Supplying Data & Instructions 70%, Arithmetic 6% (Efficient Embedded Proc, Bill Dally 08) Future projection Heterogeneous Multicore and/or simplified caching What future software needs Integrated programming g environment for heterogeneous multicore Dealing with data movement among cores for exploiting parallelization, and improving performance & efficiency Efficiency become more important in future software Potential research topics Software development environment and deployment/installation strategy & tools toward enhancing software efficiency Integrated programming environment for heterogeneous (& homogeneous) multi cores Performance aware (latency predictability) multicore software development environment important for CPS

Improving Software Robustness and Efficiency Integrated programming env. for hetero-/homo- geneous multicores Automated analysis for uncovering parallel construct for highly threaded applications including time constrains (if in CPS), for both homogeneous & heterogeneous multicores Sw Devl environment for heterogeneous multi-cores, including the environment for exploring mapping of functions to various cores, and analysis & optimization for toward the most efficient implementation given the application and available resources. Dealing with data movement among cores for exploiting parallelization, and improving performance & efficiency Performance and latency ypredictability for multicore software An integrated software development environment for CPS with control over data transfer sizes, buffer sizes, and latency characterization and support for tradeoff analysis between performance and predictability Additional signaling and preemption construct can be introduced into the above to potentially improve throughput, while still maintaining reasonable latency guarantee. Software performance estimation/modeling may also help analysis, prediction and help in controlling the latency important for real time CPS Software development & deployment strategy for enhancing efficiency Next slide

Architecture & Strategy for Development & Deployment Current Software development & deployment practice Encourage complexity and bloat Focuses on programmer productivity maximizing reuse Layer upon layers of abstraction, libraries, frameworks & API inefficient i execution Libraries, frameworks and API are designed to be general purpose large percentage of dead code Consequences Unnecessary complexity -- difficult to formally verified Wasting CPU cycle and runtime memory unnecessary slow down Provide fertile ground for return oriented programming (ROP) Rube Goldberg approved software application is analyzed in: http://lcsd05.cs.tamu.edu/papers/sevitsky_et_al.pdf Sevitsky et. al. (IBM TJ Watson Research Center) on framework based applications: In every application we looked at, an enormous amount of activity was executed to accomplish simple tasks. For example, a stock brokerage benchmark executes 268 method calls and creates 70 new objects just to move a single date field from SOAP to Java.

Layers Collapsing & Bloat Reduction Efficient Software.... Layer Collapsing Program Specialization Software development & deployment strategy for enhancing efficiency Automated redundant layers collapsing and bloat removal during compilation, installation, dynamic runtime, via compilation, binary rewriting, just-in-time compilation or runtime binary rewriting, for smaller, simpler, leaner, and more efficient executable. Software deployment strategies, which reduce the use of dynamic link library (DLL); selectively compiles in only the required part of the library statically (or derive specialized library), by justin-time compilation or binary rewriting. ** Software development environment and methodology with support for ease of layer collapsing, while preserving the benefit for software reuse Better software maintenance control & tracking methods for reducing potential for redundant logics, and software bloat. ** The assumption taken in this research is that the required storage size for code is relatively small as compare to data, and current storage is relatively cheap, and hence there is not much of a reason for optimizing for code-storage by using DLL (storage was expensive in the past). This assumption applies to various but not all computing environments. Improving Efficiency while Preserving Productivity

Beyond an Executable A A A A A A C C C C Sw Sw Mach Mach State Service Protocol Framework Framework Machine Machine Post Compile time specialization beyond single executable Dynamic & Interpreted Code Protocol subset -ing/specialization Framework specialization Balancing ~ knowing when to specialize and when not to. Strategy ~ whole, partial, none What else? Improving Efficiency w/o Being Disruptive to Development Methodology