Sudoku Solution on Many-core Processors
|
|
|
- Damian Gray
- 9 years ago
- Views:
Transcription
1 Acceleration of Genetic Algorithms for Sudoku Solution on Many-core Processors Yuji Sato* 1, Naohiro Hasegawa* 1, Mikiko Sato* 2 * 1 : Hosei University, * 2 : Tokyo University of A&T
2 0 Outline Background Sudoku Solution Accuracy by GA Accelerating Genetic Computation with Many-core Processors (GPU/ MCP) Evaluation Tests Conclusion
3 Background: Objective 1 Evolutionary Computation + Parallel Processing + Many-core architecture Practical processing time
4 Bench mark: Sudoku puzzle 2 As the first step towards that objective, we take the problem solving Sudoku puzzles and investigate acceleration of the processing with a GPU/ MCP.
5 3 The reasons for this approach (1) Sudoku puzzles are popular throughout the world.
6 The reasons for this approach (2) 4 Genetic computation is suitable for parallelization. Therefore, increasing the number of core-processors may make the processing time for GAs equal to that for backtracking algorithms.
7 5 The reasons for this approach (3) GPUs are designed for the processing of computer graphics in games. But, research on General-Purpose computation on Graphics Processing Units (GPGPU) has begun, and GPUs can be used to support solving a logical game.
8 Sudoku Solution by GA: 6 An example of Sudoku puzzles Fig. 1. An example of Sudoku puzzles, 24 positions contain a given number, the other position should be solved. A Sudoku puzzles is completed by filling in all of the empty cells with numerals 1 to 9.
9 Research Example conventional design of the chromosome 7 Fig. 4. An example of conventional design of the chromosome and the crossover operation. The chromosome is defined as one-dimensional array of 81 numbers that is divided into nine sub blocks and the crossovers points can only appear between sub blocks.
10 The problem addressed d here 8 This design generate chromosomes comprises highly hl fit schema of long length that is constructed from cell rows or columns in sub-blocks, bl and this highly fit schemata (BBs) are easy to be destructed t d by the crossover operation.
11 Basic Concept 9 Genetic operations that emphasize preservation of BB. Improve local search function.
12 Method of Applying GAs 10 Definition of Chromosomes We define this 9 x 9 two-dimensional array as the GA chromosome. Fill in the cell that do not contain given values with random numerals.
13 11 The fitness function f (x) = 9 9 g i (x) i =1 j =1 + h j (x) g i (x) = x i h j (x) = x j The score is the number of different elements in a row (g i ) or column (h j ), and the sum of the row and column scores is the score for the individual.
14 12 The fitness function Score of the row that constitute the sub-blocksblocks
15 13 Crossover Fig. 3. An example of the crossover considered the rows or the columns that t constitute t the sub-blocks. bl The child inherits the ones with the highest score.
16 Mutation Swap mutation inside the sub block 14 Fig. 5. An example of the swap mutation. Two numbers inside the sub block are selected randomly if the numbers are free to change.
17 Local Search: Multiple Offspring Sampling (MOS) parents children crossover mutation select two of them
18 16 The experimental parameters [Population size] 150 [Number of child candidates/parents] 2 [Crossover rate] 0. 3 [Mutation rate] 0. 3 [Tournament size] 3
19 Evaluation Experiments 17 The puzzles used for evaluations We selected two puzzles from each level of difficulty in the puzzle set from a book. For comparison with the conventional examples, we also used the particularly difficult Sudoku puzzles introduced by Timo Mantere in reference.
20 The puzzles used for evaluations 18 - Easy level
21 The puzzles used for evaluations 19 - Intermediate level
22 The puzzles used for evaluations 20 - Difficult level
23 The puzzles used for evaluations 21 - Super Difficult level Super difficult Sudoku s. Available via WWW: p rch='ct20a6300%20alternative%20project%20work% ' (cited ).
24 Experimental results 22 Benchmark test
25 Experimental results 23 Benchmark test Table. 1 The comparison of how effectively GA finds solutions for the Sudoku puzzles with different difficulty ratings.
26 Experimental results 25 Comparison with previous research Table. 3 Our result and the result represented in [7] Sudoku puzzle Our proposed GA Mantere-2008 [7] 100, 000 trials 100, 000 trials AI Escarcot 83 /100 5/100 [Population size] *1: 150, *2: 11 O r approach: GA (+ Local Search) Our approach: GA (+ Local Search) Mantere etc. : GA + Cultural Algorithm
27 Comparison with previous research 26 Improve efficiency Speed up Mantere etc. Cultural Algorithm (CA) Small population size Our GA Properly GA design + LS Parallel processing on GPU
28 Experimental results 27 The results show the proposed genetic operation was relatively improved the optimum solution rate. On the other hand, the processing time was still completely poor compared to the backtracking algorithm.
29 Accelerating Genetic Computation 28 with GPU: GTX460 specifications Board ELSA GLADIA GTX460 #Core 336 (7 SM X 48 Core / SM) Clock 675 MHz Memory 1 GB Shared memory / SM 48 KB #Register / SM #Thread / SM 1024 The parallelization of genetic computing must be implemented with full consideration given to the feature.
30 29 Parallel processing for individuals The genetic computing programs running in the SMs using threads are executed in parallel, and the execution of the same program in each SM with different initial values is considered to serve as a measure against initial value dependency.
31 Parallel processing for genetic 30 manipulation An example of the swap mutation within a sub-block An example of the swap mutation within a sub-block and the thread assignment.
32 31 Estimated execution time Single-core: T exe x N x G Parallel processing for individuals: T exe x N/α x G (48 < α < N) exe Parallel processing for manipulation : [(1- k) +k/β] T exe x N/α x G (0 < k < 1, 0 < β < 3) )
33 The system architecture for 32 multi-core processors (Intel Core i7)
34 Accelerating Genetic 33 Computation with GPU 7 blocks / grid 3xN N threads / block
35 Evaluation Tests: 34 Execution Environment CPU MCP: Intel Corei7 920 (2.67GHz, 4 cores) GPU: Phenom ⅡX4 945 (3 GHz, 4 cores) OS Ubuntu C Compiler CUDA Toolkit gcc (optimization " O3") 3.2 RC
36 Evaluation Tests: 35 Test Data The evaluation results for problems classified as Super Difficult-1 (SD1), Super Difficult-2 (SD2), and Super Difficult-3 (SD3). (Super difficult Sudoku s. Available via WWW: 08.pdf#search= pdf#search='ct20a6300%20alternative% 20Project%20work%202008' (cited ).) )
37 Evaluation Tests: 36 Acceleration Effect Table 6. The acceleration effect of using the GPU/MCP (SD2, Givens: 23) Count [%] Average Execution Gen. time Java 83 45,468 7m 50s 678 x 74 C 86 44,250 1m 26s 320 Core i7 #Thread: 8 GTX460 #SM: ,992 12s ,142 6s 391 Cutoff set: 100, generations, Population size: 150 x 14
38 Evaluation Tests: 37 Minimum Time (GPU) Table 13. The minimum numbers of generations and the execution times required to solve SD1 through SD3 Sudoku Minimum Gen. Execution time SD ms SD ms SD ms
39 Evaluation Tests: 38 Scalability (MCP) Table 7. The number of generations until the correct solution was obtained, the execution time, and the rate of correct answers (SD2, Givens: 23) Count [%] Average Execution Gen. time #Th: ,276 28s 41 #Th: ,580 22s 48 #Th: ,261 21s 47 #Th: ,992 12s 12
40 Evaluation Tests: 39 Scalability (GPU) Table 10. The number of generations until the correct solution was obtained, the execution time, and the rate of correct answers (SD2, Givens: 23) Count [%] Average Execution Gen. time #SM: ,067 20s 199 #SM: ,786 16s 958 #SM: ,757 12s 630 #SM: ,254 9s 260 #SM: ,709 8s 287 #SM: ,065 6s 368 #SM: ,142 6s 391
41 Evaluation Tests: 40 Appropriate Population Size (GPU) Area for individual data: 1 byte (char) x 81 x N x 2 Area for selection: 4 bytes (int) x N Area for crossover: 4 bytes (float) x N/2 Area for mutation: 1 byte (char) x 81N Total: 249N Maximum number of N which can be stored in the 48 KB shared memory: 192
42 Evaluation Tests: 41 Appropriate Population Size Table 14. The execution time and the correct solution rates for when the number of individuals is set to 192. Sudoku Count [%] Average Execution Gen. time SD s 751 SD ,481 4s 530 SD ,799 6s 862-5% - 29% - 21%
43 Evaluation Tests: 42 Appropriate Population Size (MCP) Table 12. The result on increasing the number of individuals (SD2) Count [%] Ave. Gen. Exec. Tim Best Gen ,641 11s ,992 12s ,115 19s ,441 38s ,441 84s 76 86
44 MCP v.s. GPU 43 These experiments show that the GPU can find solutions faster than the multi-core processor by making use of a higher degree of parallelization.
45 MCP v.s. GPU 44 At the same time, it is more difficult to use a GPU than a multi-core processor which can execute programs in parallel without having to worry about limitations in number of threads or shared memory capacity.
46 Conclusion 45 We have used the problem of solving Sudoku puzzles to show that parallel processing of genetic algorithms in a many-core processor can solve difficult problems in practical time.
47 Conclusion 46 Specifically, we implemented parallel genetic computing on the NVIDIA GTX 460 and Intel Core i7, and showed that execution acceleration factors of from 7 to 25 relative to execution of a C program on a CPU are attained and a correct solution rate of 100% can be achieved, even for super-difficult problems.
48 Future works 47 We want to try another parallel GA implementation on many-core processors. We need to investigate another approach to avoid initial value dependency. We want to show that EC (+ GPU) can solve super difficult Sudoku puzzles in one second.
49 Thank a you for your attention! t
A Review of Sudoku Solving using Patterns
International Journal of Scientific and Research Publications, Volume 3, Issue 5, May 2013 1 A Review of Sudoku Solving using Patterns Rohit Iyer*, Amrish Jhaveri*, Krutika Parab* *B.E (Computers), Vidyalankar
Genetic Algorithms and Sudoku
Genetic Algorithms and Sudoku Dr. John M. Weiss Department of Mathematics and Computer Science South Dakota School of Mines and Technology (SDSM&T) Rapid City, SD 57701-3995 [email protected] MICS 2009
Introduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
Introduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
Introduction to GPU Computing
Matthis Hauschild Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Technische Aspekte Multimodaler Systeme December 4, 2014 M. Hauschild - 1 Table of Contents 1. Architecture
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0
PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0 15 th January 2014 Al Chrosny Director, Software Engineering TreeAge Software, Inc. [email protected] Andrew Munzer Director, Training and Customer
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
Clustering Billions of Data Points Using GPUs
Clustering Billions of Data Points Using GPUs Ren Wu [email protected] Bin Zhang [email protected] Meichun Hsu [email protected] ABSTRACT In this paper, we report our research on using GPUs to accelerate
Sudoku Puzzles Generating: from Easy to Evil
Team # 3485 Page 1 of 20 Sudoku Puzzles Generating: from Easy to Evil Abstract As Sudoku puzzle becomes worldwide popular among many players in different intellectual levels, the task is to devise an algorithm
GPU-BASED TUNING OF QUANTUM-INSPIRED GENETIC ALGORITHM FOR A COMBINATORIAL OPTIMIZATION PROBLEM
GPU-BASED TUNING OF QUANTUM-INSPIRED GENETIC ALGORITHM FOR A COMBINATORIAL OPTIMIZATION PROBLEM Robert Nowotniak, Jacek Kucharski Computer Engineering Department The Faculty of Electrical, Electronic,
Intelligent Heuristic Construction with Active Learning
Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U Space is BIG! Hubble Ultra-Deep Field
GPU Accelerated Monte Carlo Simulations and Time Series Analysis
GPU Accelerated Monte Carlo Simulations and Time Series Analysis Institute of Physics, Johannes Gutenberg-University of Mainz Center for Polymer Studies, Department of Physics, Boston University Artemis
ultra fast SOM using CUDA
ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A
Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com
CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com Modern GPU
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
GPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles [email protected] Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
A Parallel Processor for Distributed Genetic Algorithm with Redundant Binary Number
A Parallel Processor for Distributed Genetic Algorithm with Redundant Binary Number 1 Tomohiro KAMIMURA, 2 Akinori KANASUGI 1 Department of Electronics, Tokyo Denki University, [email protected]
Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
~ Greetings from WSU CAPPLab ~
~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)
Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji [email protected]
Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration Sina Meraji [email protected] Please Note IBM s statements regarding its plans, directions, and intent are subject to
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu
1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
GPGPU Parallel Merge Sort Algorithm
GPGPU Parallel Merge Sort Algorithm Jim Kukunas and James Devine May 4, 2009 Abstract The increasingly high data throughput and computational power of today s Graphics Processing Units (GPUs), has led
An OpenCL Candidate Slicing Frequent Pattern Mining Algorithm on Graphic Processing Units*
An OpenCL Candidate Slicing Frequent Pattern Mining Algorithm on Graphic Processing Units* Che-Yu Lin Science and Information Engineering Chung Hua University [email protected] Kun-Ming Yu Science and
The Future Of Animation Is Games
The Future Of Animation Is Games 王 銓 彰 Next Media Animation, Media Lab, Director [email protected] The Graphics Hardware Revolution ( 繪 圖 硬 體 革 命 ) : GPU-based Graphics Hardware Multi-core (20 Cores
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:
Turbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
Texture Cache Approximation on GPUs
Texture Cache Approximation on GPUs Mark Sutherland Joshua San Miguel Natalie Enright Jerger {suther68,enright}@ece.utoronto.ca, [email protected] 1 Our Contribution GPU Core Cache Cache
Optimizing Code for Accelerators: The Long Road to High Performance
Optimizing Code for Accelerators: The Long Road to High Performance Hans Vandierendonck Mons GPU Day November 9 th, 2010 The Age of Accelerators 2 Accelerators in Real Life 3 Latency (ps/inst) Why Accelerators?
E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices
E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU
Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents
Introduction to GPGPU. Tiziano Diamanti [email protected]
[email protected] Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate
Computer Graphics Hardware An Overview
Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and
NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
IP Video Rendering Basics
CohuHD offers a broad line of High Definition network based cameras, positioning systems and VMS solutions designed for the performance requirements associated with critical infrastructure applications.
Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism
Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Jianqiang Dong, Fei Wang and Bo Yuan Intelligent Computing Lab, Division of Informatics Graduate School at Shenzhen,
Advances in Smart Systems Research : ISSN 2050-8662 : http://nimbusvault.net/publications/koala/assr/ Vol. 3. No. 3 : pp.
Advances in Smart Systems Research : ISSN 2050-8662 : http://nimbusvault.net/publications/koala/assr/ Vol. 3. No. 3 : pp.49-54 : isrp13-005 Optimized Communications on Cloud Computer Processor by Using
Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.
Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development
Choosing a Computer for Running SLX, P3D, and P5
Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line
Numerical Research on Distributed Genetic Algorithm with Redundant
Numerical Research on Distributed Genetic Algorithm with Redundant Binary Number 1 Sayori Seto, 2 Akinori Kanasugi 1,2 Graduate School of Engineering, Tokyo Denki University, Japan [email protected],
PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE
PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE Sudha M 1, Harish G M 2, Nandan A 3, Usha J 4 1 Department of MCA, R V College of Engineering, Bangalore : 560059, India [email protected] 2 Department
OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING
OBJECTIVE ANALYSIS WHITE PAPER MATCH ATCHING FLASH TO THE PROCESSOR Why Multithreading Requires Parallelized Flash T he computing community is at an important juncture: flash memory is now generally accepted
A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures
11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the
Pedraforca: ARM + GPU prototype
www.bsc.es Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014 Overview Goals: Test the performance, scalability, and energy efficiency of
ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 2, Issue 3, May 2013
Transistor Level Fault Finding in VLSI Circuits using Genetic Algorithm Lalit A. Patel, Sarman K. Hadia CSPIT, CHARUSAT, Changa., CSPIT, CHARUSAT, Changa Abstract This paper presents, genetic based algorithm
Leran Wang and Tom Kazmierski {lw04r,tjk}@ecs.soton.ac.uk
BMAS 2005 VHDL-AMS based genetic optimization of a fuzzy logic controller for automotive active suspension systems Leran Wang and Tom Kazmierski {lw04r,tjk}@ecs.soton.ac.uk Outline Introduction and system
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute
OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
GPU Computing - CUDA
GPU Computing - CUDA A short overview of hardware and programing model Pierre Kestener 1 1 CEA Saclay, DSM, Maison de la Simulation Saclay, June 12, 2012 Atelier AO and GPU 1 / 37 Content Historical perspective
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:
CUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles [email protected] Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
Accelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm
Journal of Al-Nahrain University Vol.15 (2), June, 2012, pp.161-168 Science Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Manal F. Younis Computer Department, College
Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter
Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter Daniel Weingaertner Informatics Department Federal University of Paraná - Brazil Hochschule Regensburg 02.05.2011 Daniel
GeoImaging Accelerator Pansharp Test Results
GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance
The resulting tile cannot merge with another tile again in the same move. When a 2048 tile is created, the player wins.
2048 2048 is number puzzle game created in March 2014 by 19-year-old Italian web developer Gabriele Cirulli, in which the objective is to slide numbered tiles on a grid to combine them and create a tile
Optimizing a 3D-FWT code in a cluster of CPUs+GPUs
Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la
Course Development of Programming for General-Purpose Multicore Processors
Course Development of Programming for General-Purpose Multicore Processors Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Richmond, VA 23284 [email protected]
5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model
5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model C99, C++, F2003 Compilers Optimizing Vectorizing Parallelizing Graphical parallel tools PGDBG debugger PGPROF profiler Intel, AMD, NVIDIA
High Performance GPGPU Computer for Embedded Systems
High Performance GPGPU Computer for Embedded Systems Author: Dan Mor, Aitech Product Manager September 2015 Contents 1. Introduction... 3 2. Existing Challenges in Modern Embedded Systems... 3 2.1. Not
Alberto Corrales-García, Rafael Rodríguez-Sánchez, José Luis Martínez, Gerardo Fernández-Escribano, José M. Claver and José Luis Sánchez
Alberto Corrales-García, Rafael Rodríguez-Sánchez, José Luis artínez, Gerardo Fernández-Escribano, José. Claver and José Luis Sánchez 1. Introduction 2. Technical Background 3. Proposed DVC to H.264/AVC
CSE 6040 Computing for Data Analytics: Methods and Tools
CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 12 Computer Architecture Overview and Why it Matters DA KUANG, POLO CHAU GEORGIA TECH FALL 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS
GPU Hardware and Programming Models. Jeremy Appleyard, September 2015
GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once
HY345 Operating Systems
HY345 Operating Systems Recitation 2 - Memory Management Solutions Panagiotis Papadopoulos [email protected] Problem 7 Consider the following C program: int X[N]; int step = M; //M is some predefined constant
Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
Introduction to GPU Architecture
Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three
An Efficient Approach for Task Scheduling Based on Multi-Objective Genetic Algorithm in Cloud Computing Environment
IJCSC VOLUME 5 NUMBER 2 JULY-SEPT 2014 PP. 110-115 ISSN-0973-7391 An Efficient Approach for Task Scheduling Based on Multi-Objective Genetic Algorithm in Cloud Computing Environment 1 Sourabh Budhiraja,
CUDA Basics. Murphy Stein New York University
CUDA Basics Murphy Stein New York University Overview Device Architecture CUDA Programming Model Matrix Transpose in CUDA Further Reading What is CUDA? CUDA stands for: Compute Unified Device Architecture
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
İSTANBUL AYDIN UNIVERSITY
İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER
Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
Sudoku Madness. Team 3: Matt Crain, John Cheng, and Rabih Sallman
Sudoku Madness Team 3: Matt Crain, John Cheng, and Rabih Sallman I. Problem Description Standard Sudoku is a logic-based puzzle in which the user must fill a 9 x 9 board with the appropriate digits so
Lab 4: 26 th March 2012. Exercise 1: Evolutionary algorithms
Lab 4: 26 th March 2012 Exercise 1: Evolutionary algorithms 1. Found a problem where EAs would certainly perform very poorly compared to alternative approaches. Explain why. Suppose that we want to find
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 [email protected] THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
Industry First X86-based Single Board Computer JaguarBoard Released
Industry First X86-based Single Board Computer JaguarBoard Released HongKong, China (May 12th, 2015) Jaguar Electronic HK Co., Ltd officially launched the first X86-based single board computer called JaguarBoard.
Intrusion Detection Architecture Utilizing Graphics Processors
Acta Informatica Pragensia 1(1), 2012, 50 59, DOI: 10.18267/j.aip.5 Section: Online: aip.vse.cz Peer-reviewed papers Intrusion Detection Architecture Utilizing Graphics Processors Liberios Vokorokos 1,
Benchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
High Performance Matrix Inversion with Several GPUs
High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República
Modern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi
I.J. Information Technology and Computer Science, 2015, 09, 8-14 Published Online August 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2015.09.02 Modern Platform for Parallel Algorithms
Cellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
FPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model
Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA Part 1: Hardware design and programming model Amin Safi Faculty of Mathematics, TU dortmund January 22, 2016 Table of Contents Set
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
Learn CUDA in an Afternoon: Hands-on Practical Exercises
Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA
Enabling Technologies for Distributed and Cloud Computing
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading
GPU Computing with CUDA Lecture 4 - Optimizations. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile
GPU Computing with CUDA Lecture 4 - Optimizations Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 3 Control flow Coalescing Latency hiding
