5MD00. Assignment Introduction. Luc Waeijen 16-12-2014



Similar documents
Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

DARPA, NSF-NGS/ITR,ACR,CPA,

MPI and Hybrid Programming Models. William Gropp

Convolution, Correlation, & Fourier Transforms. James R. Graham 10/25/2005

Functional Data Analysis of MALDI TOF Protein Spectra

Introduction to Cloud Computing

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

W03 Analysis of DC Circuits. Yrd. Doç. Dr. Aytaç Gören

Systolic Computing. Fundamentals

Architectures and Platforms

A DESIGN OF DSPIC BASED SIGNAL MONITORING AND PROCESSING SYSTEM

Architectures for Big Data Analytics A database perspective

Image Compression through DCT and Huffman Coding Technique

Assessing the Performance of OpenMP Programs on the Intel Xeon Phi

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Abstract. Cycle Domain Simulator for Phase-Locked Loops

Windows Server Performance Monitoring

Laboratory 4: Feedback and Compensation

Embedded Linux RADAR device

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

SGN-1158 Introduction to Signal Processing Test. Solutions

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Short-time FFT, Multi-taper analysis & Filtering in SPM12

Research on the UHF RFID Channel Coding Technology based on Simulink

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

LabVIEW Day 1 Basics. Vern Lindberg. 1 The Look of LabVIEW

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

EE289 Lab Fall LAB 4. Ambient Noise Reduction. 1 Introduction. 2 Simulation in Matlab Simulink

FFT Frequency Detection on the dspic

A Branch and Bound Algorithm for Solving the Binary Bi-level Linear Programming Problem

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

Designing the NEWCARD Connector Interface to Extend PCI Express Serial Architecture to the PC Card Modular Form Factor

The Visualization Pipeline

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Program Optimization for Multi-core Architectures

Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur

A Lab Course on Computer Architecture

Network traffic: Scaling

Understanding the IEC Programming Languages

FPGA area allocation for parallel C applications

High-Level Synthesis for FPGA Designs

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Scalability and Classifications


Linear Motion and Assembly Technologies Pneumatics Service. Understanding the IEC Programming Languages

STUDY ON HARDWARE REALIZATION OF GPS SIGNAL FAST ACQUISITION

Runtime Hardware Reconfiguration using Machine Learning

Non-Data Aided Carrier Offset Compensation for SDR Implementation

What s New in MATLAB and Simulink

Fault Modeling. Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults. Transistor faults Summary

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

PyMTL and Pydgin Tutorial. Python Frameworks for Highly Productive Computer Architecture Research

Neural Network Design in Cloud Computing

Sequential Performance Analysis with Callgrind and KCachegrind

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Alternative Biometric as Method of Information Security of Healthcare Systems

How PLL Performances Affect Wireless Systems

Introduction to Engineering System Dynamics

FOURIER TRANSFORM BASED SIMPLE CHORD ANALYSIS. UIUC Physics 193 POM

OFDM, Mobile Software Development Framework

Spring 2011 Prof. Hyesoon Kim

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

Energy Efficient MapReduce

Anomaly Detection in Predictive Maintenance

Multicore Parallel Computing with OpenMP

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Java Bit Torrent Client

Student Home

From Control Loops to Software

Accelerating Wordpress for Pagerank and Profit

Big Data Processing and Analytics for Mouse Embryo Images

Automated Software Testing of Memory Performance in Embedded GPUs. Sudipta Chattopadhyay, Petru Eles and Zebo Peng! Linköping University

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Parallel and Distributed Computing Programming Assignment 1

Map-Reduce for Machine Learning on Multicore

Cell Phone Vibration Experiment

High Performance Computing. Course Notes HPC Fundamentals

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors

PowerPC Microprocessor Clock Modes

(!' ) "' # "*# "!(!' +,

Computer Systems Structure Input/Output

COMPUTER SCIENCE AND ENGINEERING - Microprocessor Systems - Mitchell Aaron Thornton

Using Power to Improve C Programming Education

Vehicle data acquisition using CAN By Henning Olsson, OptimumG

Monitoring Traffic manager

DIGITAL-TO-ANALOGUE AND ANALOGUE-TO-DIGITAL CONVERSION

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

A Demonstration of a Robust Context Classification System (CCS) and its Context ToolChain (CTC)

ADVANCED APPLICATIONS OF ELECTRICAL ENGINEERING

Load Balancing Using a Co-Simulation/Optimization/Control Approach. Petros Ioannou

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

P300 Spelling Device with g.usbamp and Simulink V Copyright 2012 g.tec medical engineering GmbH

Operation Count; Numerical Linear Algebra

Structural Health Monitoring Tools (SHMTools)

Designing a Schematic and Layout in PCB Artist

Transcription:

5MD00 Assignment Introduction Luc Waeijen 16-12-2014

Contents EEG application Background on EEG Early Seizure Detection Algorithm Implementation Details Super Scalar Assignment Description Tooling (simple scalar & wattch) Multi Core Assignment Description Tooling (sniper sim, McPat & OpenMP)

Electroencephalography (EEG) Lets plug some wires into a brain!! 3

EEG Signals 4

Uses Research Determine functionality of brain Observe physical reaction to various stimuli Clinical Monitoring during operations Diagnosis and classification of epilepsy Prognosticate coma patients Test for brain damage Brain computer interface Paralyzed people Gaming 5

Uses Research Determine functionality of brain Observe physical reaction to various stimuli Clinical Monitoring during operations Diagnosis and classification of epilepsy Prognosticate coma patients Test for brain damage Can even predict seizures Brain computer interface before they occur! Paralyzed people Gaming 6

Early Seizure Detection 7

EEG devices are large and make you look like something from the matrix! 8

EEG devices are large and make you look like something from the matrix! Epilepsy patients cannot walk the streets like this! 9

EEG devices are large and make you look like something from the matrix! Social Stigma 10

But we are working on it! 11

Mobile EEG We are developing mobile EEG devices that can be implanted under the skin of the head. Result is practically invisible sensors, but with severe energy constraints! Early seizure detection has to be computed on nodes, cannot afford to transmit all data to off site processing unit 12

Mobile EEG We are developing mobile EEG devices that can be implanted under the skin of the head. This is where you guys come in! Result is practically invisible sensors, but with severe energy constraints! Early seizure detection has to be computed on nodes, cannot afford to transmit all data to off site processing unit 13

The Algorithm Analysis of EEG time signal using discrete wavelet transform over a time window (convolution of time series with a wavelet function) Mathematically a wavelet will correlate with the signal if the unknown signal contains information of similar frequency Conceptually similar to a Fourier transform (actually the Fourier transform is a special case of the wavelet transform) 14

Sliding Window 15

Mother Wavelet In this algorithm we use the Mexican hat mother wavelet: 16

Daughter Wavelets and Coefficients Mother wavelet is scaled (s) and shifted (tau) to generate daughter wavelets Can now calculate wavelet coefficients for different s and tau by convolution with daughter wavelet 17

Normalizing and selecting Max Normalized Wavelet Energy Select index of maximum, which is used as indicator value for a given time window (that particular scale and shift dominated the window) 18

Implementation Trick The convolution theorem states that under suitable conditions the Fourier transform of a convolution is the pointwise product of Fourier transforms (http://en.wikipedia.org/wiki/convolution_theorem) Fast Fourier Transform can be implemented pretty efficiently, let's use that instead of convolution in time domain! 19

Discrete Wavelet Transform using the Fast Fourier Transform (FFT) Fourier-transformation input signal y(t) with FFT Generate Daughter Wavelet for chosen scaling factor s and FFT this wavelet Multiply FFT'd input and daughter wavelet (Multiplication in frequency domain is convolution in time domain!) Inverse FFT (ifft) gives Wavelet Coefficients for different shifts (tau) and chosen scaling s Back to step 2, until all discrete scaling values s are processed 20

Input data EEG signal sampled at 200 Hz Examine windows of 2 seconds Shift window with steps of 1 second Wavelet transform and measurements taken from: Non-parametric early seizure detection in an animal model of temporal lobe epilepsy - Sachin S. Talathi et al. 21

Electrically induced seizures in rats 22

Implementation You are given pure C-code Ported from MATLAB code provided with Sachin S. Talathi paper Function and variable names are not always very clear (taken directly from matlab code) Code is functional, but not at all optimized! 23

Call Graph DataLoader Utilities FFT wavebase Analysis Main (loop over windows) Compute Wavelet cwavelet Inverse FFT Generate output graph 24

Call Graph DataLoader Utilities FFT wavebase Analysis Main (loop over windows) Compute Wavelet cwavelet Inverse FFT Generate output graph Region of Interest 25

Questions about the application? 26

27

Single Core Assignment 28

Single Core Assignment Find the most suited super scalar architectures for the early seizure detection algorithm in terms of performance (CPI) and energy-delay-product (CPI*Energy/Cycle)*CPI. EDP Performance 29

Tools of the Trade Simple Scalar Super scalar simulator Designed for fast DSE at architecture level Used in many published papers Wattch Architecture level power estimation Complementary to simple scalar 30

31

32

33

34

35

Functional 36

Functional Performance Simulation (also provides functional, but much slower) 37

38

39

Functional Performance Simulation Fast, but low details Slow, but high details 40

41

42

How to use? To get started, follow the steps on the website: http://www.es.ele.tue.nl/~mwijtvliet/5md00_sc/ Once you can run the example, start exploring the design space! You can configure the processor in many ways, take a look at the tunable parameters at: http://www.es.ele.tue.nl/~mwijtvliet/5md00_sc/?page=parameters https://courses.cs.washington.edu/courses/cse471/06sp/hw/simplescalar3.0guide.pdf 43

Make sure you understand all the parameters before tuning If there is a parameter you don't understand, check in the cheat-sheet what it stands for. If you do not know the technique/term, google it! There is no point in tuning parameters if you do not know what they do. 44

Tunable Parameters Branch prediction 45

46

47

48

Approach In your report we want to see why you chose certain configurations. Observe the performance Identify bottlenecks as good as possible Try to improve the bottlenecks and document in your report: What you observed What you tried and why you thought it would help The effect of the tuning (did it work? If not, why?) 49

Approach In your report we want to see why you chose certain configurations. Observe the performance Identify bottlenecks as good as possible Forand example sim-profile Try to improve the bottlenecks document in your report: What you observed What you tried and why you thought it would help The effect of the tuning (did it work? If not, why?) 50

51

Questions regarding the single core assignment? 52

53

Multi Core Assignment 54

Multi Core Assignment Map the early seizure detection onto a multi core x86 platform Minimize Energy-Delay-Area-Product (EDAP) Parallelize the code Tune the multi core platform (Memory, interconnect, number of cores) Optimize the given c-code (e.g. loop transformations to increase locality) Any other technique you can think of! 55

Tools of the Trade Sniper Sim 56

Tools of the Trade Sniper Sim 57

Tools of the Trade Sniper Sim Tool to analyze performance of multicore systems Cores themselves are modeled on a functional level Interconnect, memory and number of cores (i.e. the system level) can be configured and has performance models McPat Integrated Power, Area and Timing modeling framework Integrates with Sniper Sim toolflow OpenMP Framework to parallelize source for multicore platforms 58

59

60

61

62

63

64

McPat 65

How to Use Again, check the website to get started: http://www.es.ele.tue.nl/~mwijtvliet/5md00_mc/?page= preparation N.B. Download the application source again, since the makefile differs from the previous assignment! N.B. Extract the application source in the correct directory!! Sniper Sim uses relative paths in it's Makefiles!! (see guidelines page) For the tunable parameters, check the sniper-manual (in particular chapters 5 & 6). The website also explains how to extract the energy, delay and area metrics from the tools 66

Again, make sure you understand what you tune! Read the sniper sim manual to understand the configuration options. In general the options are a bit more high level, and probably easier to understand. Procedure (for report): Profile the application, identify bottlenecks Try to improve the mapping/system Document what you tried and what you were expecting Show the results. Did your solution work? If not, try to explain why. Try to obtain the minimum EDAP! 67

OpenMP 68

OpenMP N.B. OpenMP does not check (and certainly not prove) whether the specified parallelization is correct (or beneficial). When a piece of code is annotated, the compiler simply assumes that it is OK to parallelize it. Beware of race conditions! 69

Finally Please use the forums to ask your questions!! Use a topic title that captures your problem as accurately as possible (i.e. not 'code not working'). Before you ask, check if someone else had a similar issue. Perhaps the answer to your question is already there. Remember Google is your friend, it can typically provide answers much faster than the TAs If you want to talk to a TA, send us an email to make an appointment. 70

Slides in this presentation were shamelessly copied from: http://snipersim.org/documents/2012-06-09%2 0Sniper%20ISCA%20Tutorial.pdf https://courses.cs.washington.edu/courses/cse 471/06sp/hw/simpleScalar3.0guide.pdf http://www.simplescalar.com/docs/simple_tutori al_v2.pdf http://www.simplescalar.com/docs/simple_tutori al_v4.pdf 71