Intelligent Heuristic Construction with Active Learning

Size: px
Start display at page:

Download "Intelligent Heuristic Construction with Active Learning"

Transcription

1 Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U

2 Space is BIG! Hubble Ultra-Deep Field Tiny region of space shown Despite this, many galaxies Each galaxy, billions of stars Relevance to heuristics?

3 Optimisation spaces are MUCH BIGGER!!! We can t pick from Rough heuristics instead Atoms in the Universe Traditionally hard-coded Can take a year to perfect As if that wasn't bad enough Combinations of GCC Optimisations

4 the problem is even worse than that! Each architectural change requires heuristics to be re-tuned Heuristics are inherently tied to the underlying hardware Most compilers support many different platforms Very difficult to keep up and getting harder We already have out of date compilers

5 Machine Learning to the rescue? Leverage machine learning techniques to create heuristics Well suited to the problem Lots of interesting research Can be better than Humans But, it s also incredibly slow to learn We demonstrate how it s possible to accelerate training Create a heuristic which maps workload to processor

6 feature values Quick Detour: Machine Learning 101 Classification involves forming a correlation between the features of an object and its label examples Machine Learning Algorithm Model best heuristic value

7 Training a Heuristic thousands of examples input value 2 input value 1

8 Training a Heuristic thousands of examples Machine Learning Algorithm input value 2 input value 1

9 Training a Heuristic thousands of examples Machine Learning Algorithm input value 2 GPU CPU mathematical model input value 1

10 Using a Heuristic unseen features Mathematical Model input value 2 GPU CPU predicted processor input value 1

11 So what s wrong with this? feature 2 feature 1 Traditional approach almost universally adopted

12 Well, we actually only needed these! feature 2 feature 1

13 So this was a complete waste of time! feature 2 feature 1 Random sampling inevitably leads to redundancy

14 How much time was wasted? Correctness of labels are tied to heuristic quality I.e. consistently wrong labels leads to wrong model Sound data is essential, but very expensive E.g. are inputs X, Y, Z faster on CPU or GPU? 1. Run program on CPU using X, Y, Z 2. Run program on GPU using X, Y, Z 3. GOTO 1 until statistical difference observed

15 Compile-time Heuristics are Even Slower Labelling one single example requires iterative compilation compile code using different optimisation values repeated profiling to make statistically sound determination only then, associate best optimisation with code features.exe.c.exe best optimisation wins.exe

16 What do we do about it? We cannot know where the informative examples lie But, we can let the algorithm make an educated guess You and I do not learn in a random, unstructured way We build up our knowledge gradually and iteratively Perhaps, let the algorithm do the same?

17 Active Supervised Learning passive (random) thousands of random examples Machine Learning Algorithm final model

18 Active Supervised Learning passive (random) active (iterative) few random examples thousands of random examples ML Algorithm intermediate model Machine Learning Algorithm final model

19 Active Supervised Learning passive (random) active (iterative) few random examples thousands of random examples Machine Learning Algorithm ML Algorithm intermediate model completion reached? no carefully select an example final model yes final model

20 How do we know when it s complete? few random examples Many criteria, including time elapsed loop iterations ML Algorithm intermediate model carefully select an example cross-validation completion reached? no yes final model

21 What about selecting examples? few random examples Many algorithms available Used Query by Committee Easier to show than to tell ML Algorithm intermediate model carefully select an example completion reached? no yes final model

22 We start with a few random examples feature 2 feature 1

23 We form multiple intermediate models feature 2 feature 1

24 Each with a distinct algorithm feature 2 feature 1

25 A committee of different models feature 2 feature 1

26 Here the committee disagrees, but we use this to our advantage feature 2 feature 1 Disagreement regions hold the greatest potential to improve the collective knowledge learn from there!

27 So what example do we learn from next? feature 2 feature 1 We ask each model to predict the label of random unseen examples drawn from the feature space

28 Broadly the Committee will agree feature 2 feature 1

29 but we re interested in disagreement! feature 2 feature 1 Disagreement inevitably occurs around class boundaries

30 We select one of these examples to label properly feature 2 feature 1

31 Then rebuild the intermediate models feature 2 feature 1 Notice the region of disagreement has shrunk Eventually the distinct models will converge

32 Experimental Setup Demonstrate technique by creating an important heuristic Map workload to fastest device CPU or GPU Much studied problem, choosing poorly can drastically degrade performance Specifically, given inputs for Rodinia HotSpot, PathFinder, SRAD and Matrix Multiplication is it faster to use OpenMP (CPU) or OpenCL (GPU)? Compared number of training examples required to get high accuracy heuristic using passive versus active learning

33 A few gory details most in the paper Measured accuracy of randomly-trained vs. QBCtrained classifier using 500 test examples Intel Core i7 3.4GHz (8 HW Threads) NVIDIA Geforce GTX Titan (6GB) 12 distinct committee members 1 random example to begin 10,000 candidate examples 200 loop iterations

34 Random Training Examples 120 CPU GPU Sample Points Program Input Parameter Program Input Parameter

35 QBC Chosen Training Examples 120 CPU GPU Sample Points Program Input Parameter Program Input Parameter Same accuracy but quicker

36 Lights, Camera, Action... Region of Disagreement over time Shape of Model over time Shows ib1 algorithm refining a HotSpot model over time, using training examples chosen by a committee

37 It works 3x faster on average!

38 Summary Desperately need fast, reliable method to generate heuristics Current implementations rely on learning randomly Randomness is problematic because of labelling costs We show active learning is much more efficient 3x faster at creating heuristics to map program inputs to best processor in a heterogeneous system

Experiments in Unstructured Mesh Finite Element CFD Using CUDA

Experiments in Unstructured Mesh Finite Element CFD Using CUDA Experiments in Unstructured Mesh Finite Element CFD Using CUDA Graham Markall Software Performance Imperial College London http://www.doc.ic.ac.uk/~grm08 grm08@doc.ic.ac.uk Joint work with David Ham and

More information

GPU for Scientific Computing. -Ali Saleh

GPU for Scientific Computing. -Ali Saleh 1 GPU for Scientific Computing -Ali Saleh Contents Introduction What is GPU GPU for Scientific Computing K-Means Clustering K-nearest Neighbours When to use GPU and when not Commercial Programming GPU

More information

Analysis of GPU Parallel Computing based on Matlab

Analysis of GPU Parallel Computing based on Matlab Analysis of GPU Parallel Computing based on Matlab Mingzhe Wang, Bo Wang, Qiu He, Xiuxiu Liu, Kunshuai Zhu (School of Computer and Control Engineering, University of Chinese Academy of Sciences, Huairou,

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Big-data Analytics: Challenges and Opportunities

Big-data Analytics: Challenges and Opportunities Big-data Analytics: Challenges and Opportunities Chih-Jen Lin Department of Computer Science National Taiwan University Talk at 台 灣 資 料 科 學 愛 好 者 年 會, August 30, 2014 Chih-Jen Lin (National Taiwan Univ.)

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

GeoImaging Accelerator Pansharp Test Results

GeoImaging Accelerator Pansharp Test Results GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

Several tips on how to choose a suitable computer

Several tips on how to choose a suitable computer Several tips on how to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and postprocessing of your data with Artec

More information

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

More information

The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa

The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa The Impact of Memory Subsystem Resource Sharing on Datacenter Applications Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa Introduction Problem Recent studies into the effects of memory

More information

2: Computer Performance

2: Computer Performance 2: Computer Performance http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture2.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12

More information

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures 11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute

More information

Choosing a Computer for Running SLX, P3D, and P5

Choosing a Computer for Running SLX, P3D, and P5 Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line

More information

Several tips on how to choose a suitable computer

Several tips on how to choose a suitable computer Several tips on how to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and postprocessing of your data with Artec

More information

How to choose a suitable computer

How to choose a suitable computer How to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and post-processing your data with Artec Studio. While

More information

Exploiting GPU Hardware Saturation for Fast Compiler Optimization

Exploiting GPU Hardware Saturation for Fast Compiler Optimization Exploiting GPU Hardware Saturation for Fast Compiler Optimization Alberto Magni School of Informatics University of Edinburgh United Kingdom a.magni@sms.ed.ac.uk Christophe Dubach School of Informatics

More information

Effective Java Programming. efficient software development

Effective Java Programming. efficient software development Effective Java Programming efficient software development Structure efficient software development what is efficiency? development process profiling during development what determines the performance of

More information

High Performance GPGPU Computer for Embedded Systems

High Performance GPGPU Computer for Embedded Systems High Performance GPGPU Computer for Embedded Systems Author: Dan Mor, Aitech Product Manager September 2015 Contents 1. Introduction... 3 2. Existing Challenges in Modern Embedded Systems... 3 2.1. Not

More information

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015 INF5063: Programming heterogeneous multi-core processors because the OS-course is just to easy! Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks October 20 th 2015 Håkon Kvale

More information

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability

More information

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration Jinglin Zhang, Jean François Nezan, Jean-Gabriel Cousin, Erwan Raffin To cite this version: Jinglin Zhang,

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

Tableau Server 7.0 scalability

Tableau Server 7.0 scalability Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization

More information

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

More information

Clustering Billions of Data Points Using GPUs

Clustering Billions of Data Points Using GPUs Clustering Billions of Data Points Using GPUs Ren Wu ren.wu@hp.com Bin Zhang bin.zhang2@hp.com Meichun Hsu meichun.hsu@hp.com ABSTRACT In this paper, we report our research on using GPUs to accelerate

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

SCALABILITY AND AVAILABILITY

SCALABILITY AND AVAILABILITY SCALABILITY AND AVAILABILITY Real Systems must be Scalable fast enough to handle the expected load and grow easily when the load grows Available available enough of the time Scalable Scale-up increase

More information

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

Experiences on using GPU accelerators for data analysis in ROOT/RooFit Experiences on using GPU accelerators for data analysis in ROOT/RooFit Sverre Jarp, Alfio Lazzaro, Julien Leduc, Yngve Sneen Lindal, Andrzej Nowak European Organization for Nuclear Research (CERN), Geneva,

More information

Report Paper: MatLab/Database Connectivity

Report Paper: MatLab/Database Connectivity Report Paper: MatLab/Database Connectivity Samuel Moyle March 2003 Experiment Introduction This experiment was run following a visit to the University of Queensland, where a simulation engine has been

More information

MAGENTO HOSTING Progressive Server Performance Improvements

MAGENTO HOSTING Progressive Server Performance Improvements MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 sales@simplehelix.com 1.866.963.0424 www.simplehelix.com 2 Table of Contents

More information

Get an Easy Performance Boost Even with Unthreaded Apps. with Intel Parallel Studio XE for Windows*

Get an Easy Performance Boost Even with Unthreaded Apps. with Intel Parallel Studio XE for Windows* Get an Easy Performance Boost Even with Unthreaded Apps for Windows* Can recompiling just one file make a difference? Yes, in many cases it can! Often, you can achieve a major performance boost by recompiling

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.

More information

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents

More information

Parallelism and Cloud Computing

Parallelism and Cloud Computing Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication

More information

Scalable Machine Learning - or what to do with all that Big Data infrastructure

Scalable Machine Learning - or what to do with all that Big Data infrastructure - or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection

More information

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Xuan Shi GRA: Bowei Xue University of Arkansas Spatiotemporal Modeling of Human Dynamics

More information

Data on Kernel Failures and Security Incidents

Data on Kernel Failures and Security Incidents Data on Kernel Failures and Security Incidents Ravishankar K. Iyer (W. Gu, Z. Kalbarczyk, G. Lyle, A. Sharma, L. Wang ) Center for Reliable and High-Performance Computing Coordinated Science Laboratory

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

SUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS - 2013 UPDATE

SUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS - 2013 UPDATE SUBJECT: SOLIDWORKS RECOMMENDATIONS - 2013 UPDATE KEYWORDS:, CORE, PROCESSOR, GRAPHICS, DRIVER, RAM, STORAGE SOLIDWORKS RECOMMENDATIONS - 2013 UPDATE Below is a summary of key components of an ideal SolidWorks

More information

Turbomachinery CFD on many-core platforms experiences and strategies

Turbomachinery CFD on many-core platforms experiences and strategies Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29

More information

A Review of Customized Dynamic Load Balancing for a Network of Workstations

A Review of Customized Dynamic Load Balancing for a Network of Workstations A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester

More information

ultra fast SOM using CUDA

ultra fast SOM using CUDA ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) PARALLEL JAVASCRIPT Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) JAVASCRIPT Not connected with Java Scheme and self (dressed in c clothing) Lots of design errors (like automatic semicolon

More information

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013 Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional

More information

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips Technology Update White Paper High Speed RAID 6 Powered by Custom ASIC Parity Chips High Speed RAID 6 Powered by Custom ASIC Parity Chips Why High Speed RAID 6? Winchester Systems has developed High Speed

More information

Autotuning dense linear algebra libraries on GPUs and overview of the MAGMA library

Autotuning dense linear algebra libraries on GPUs and overview of the MAGMA library Autotuning dense linear algebra libraries on GPUs and overview of the MAGMA library Rajib Nath, Stan Tomov, Jack Dongarra Innovative Computing Laboratory University of Tennessee, Knoxville Speaker: Emmanuel

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket

More information

Managing Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction

Managing Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction Managing Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction Cristina Silvano cristina.silvano@polimi.it Politecnico di Milano HiPEAC CSW Athens 2014 Motivations System

More information

CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content

CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content Advances in Networks, Computing and Communications 6 92 CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content Abstract D.J.Moore and P.S.Dowland

More information

Case Study on Productivity and Performance of GPGPUs

Case Study on Productivity and Performance of GPGPUs Case Study on Productivity and Performance of GPGPUs Sandra Wienke wienke@rz.rwth-aachen.de ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

Reliable Systolic Computing through Redundancy

Reliable Systolic Computing through Redundancy Reliable Systolic Computing through Redundancy Kunio Okuda 1, Siang Wun Song 1, and Marcos Tatsuo Yamamoto 1 Universidade de São Paulo, Brazil, {kunio,song,mty}@ime.usp.br, http://www.ime.usp.br/ song/

More information

High-End Workstation Project Proposal

High-End Workstation Project Proposal High-End Workstation Project Proposal Introduction The modern day desktop workstation can provide computational power equivalent to the most powerful super computers 25 years ago. Current models provide

More information

Whitepaper: performance of SqlBulkCopy

Whitepaper: performance of SqlBulkCopy We SOLVE COMPLEX PROBLEMS of DATA MODELING and DEVELOP TOOLS and solutions to let business perform best through data analysis Whitepaper: performance of SqlBulkCopy This whitepaper provides an analysis

More information

Increasing Marketing ROI with Optimized Prediction

Increasing Marketing ROI with Optimized Prediction Increasing Marketing ROI with Optimized Prediction Yottamine s Unique and Powerful Solution Smart marketers are using predictive analytics to make the best offer to the best customer for the least cost.

More information

OpenCL Programming for the CUDA Architecture. Version 2.3

OpenCL Programming for the CUDA Architecture. Version 2.3 OpenCL Programming for the CUDA Architecture Version 2.3 8/31/2009 In general, there are multiple ways of implementing a given algorithm in OpenCL and these multiple implementations can have vastly different

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Agile Performance Testing

Agile Performance Testing Agile Performance Testing Cesario Ramos Independent Consultant AgiliX Agile Development Consulting Overview Why Agile performance testing? Nature of performance testing Agile performance testing Why Agile

More information

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large

More information

Parallelization: Binary Tree Traversal

Parallelization: Binary Tree Traversal By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First

More information

ST810 Advanced Computing

ST810 Advanced Computing ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Hyper ISE. Performance Driven Storage. XIO Storage. January 2013

Hyper ISE. Performance Driven Storage. XIO Storage. January 2013 Hyper ISE Performance Driven Storage January 2013 XIO Storage October 2011 Table of Contents Hyper ISE: Performance-Driven Storage... 3 The Hyper ISE Advantage... 4 CADP: Combining SSD and HDD Technologies...

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt. Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.edu Outline Motivation why do we need GPUs? Past - how was GPU programming

More information

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved. Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,

More information

The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server

The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server Research Report The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server Executive Summary Information technology (IT) executives should be

More information

Data-parallel Acceleration of PARSEC Black-Scholes Benchmark

Data-parallel Acceleration of PARSEC Black-Scholes Benchmark Data-parallel Acceleration of PARSEC Black-Scholes Benchmark AUGUST ANDRÉN and PATRIK HAGERNÄS KTH Information and Communication Technology Bachelor of Science Thesis Stockholm, Sweden 2013 TRITA-ICT-EX-2013:158

More information

Data Center and Cloud Computing Market Landscape and Challenges

Data Center and Cloud Computing Market Landscape and Challenges Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution

More information

Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster

Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster Jonatan Ward Sergey Andreev Francisco Heredia Bogdan Lazar Zlatka Manevska Eindhoven University of Technology,

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies

More information

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU Efficient Parallel Graph Exploration on Multi-Core CPU and GPU Pervasive Parallelism Laboratory Stanford University Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun Graph and its Applications Graph Fundamental

More information

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Jianqiang Dong, Fei Wang and Bo Yuan Intelligent Computing Lab, Division of Informatics Graduate School at Shenzhen,

More information

Project INF BigData. Figure 1: Plot of the learned function from the checker board data set.

Project INF BigData. Figure 1: Plot of the learned function from the checker board data set. Project INF BigData Roberto Fontanarosa, Tobias Rupp, and Steffen Hirschmann Figure 1: Plot of the learned function from the checker board data set. Abstract Prediction and forecasting has become very

More information

Statistical Challenges with Big Data in Management Science

Statistical Challenges with Big Data in Management Science Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision

More information

Speeding ETL Processing in Data Warehouses White Paper

Speeding ETL Processing in Data Warehouses White Paper Speeding ETL Processing in Data Warehouses White Paper 020607dmxwpADM High-Performance Aggregations and Joins for Faster Data Warehouse Processing Data Processing Challenges... 1 Joins and Aggregates are

More information

Table of Contents. June 2010

Table of Contents. June 2010 June 2010 From: StatSoft Analytics White Papers To: Internal release Re: Performance comparison of STATISTICA Version 9 on multi-core 64-bit machines with current 64-bit releases of SAS (Version 9.2) and

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,

More information

WHY WORRY ABOUT PERFORMANCE IN E-COMMERCE SOLUTIONS?

WHY WORRY ABOUT PERFORMANCE IN E-COMMERCE SOLUTIONS? WHY WORRY ABOUT PERFORMANCE IN E-COMMERCE SOLUTIONS? Dr. Ed Upchurch & Dr. John Murphy Abstract This paper will discuss the evolution of computer systems, and will show that while the system performance

More information