Adaptive Stable Additive Methods for Linear Algebraic Calculations
|
|
- Isabella May
- 8 years ago
- Views:
Transcription
1 Adaptive Stable Additive Methods for Linear Algebraic Calculations József Smidla, Péter Tar, István Maros University of Pannonia Veszprém, Hungary 4 th of July 204. / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
2 Outline Linear algebraic kernel Dot product 2 Hilbert matrix Condition number Large condition number aware logic 3 Stable dot product Primary large condition number detector 2 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
3 Linear algebraic kernel Dot product Pannon Optimizer: linear programming solver Linear programming problem min c T x Ax = b x j 0, j =..n Linear algebraic kernel Provides linear algebraic algorithms and data structures: vector operations (e.g. dot product) FTRAN: α = B a BTRAN: π T = h T B where B is the actual basis 3 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
4 Dot product Floating point numbers: ( ) s.m m 2...m n 2 e Errors s {0,}: sign m i : i th bit of the mantissa e: exponent Rounding error: A = A + B, where A» B, and B 0 Cancellation: Given A and B 0, A = -B C = A + B Expectation: C = 0 Error: C = ±ε These errors can create a lot of fake nonzeros, lead to wrong results and slow down the algorithms. 4 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
5 Intel s SIMD architecture Dot product Paralell operations on multiple data SSE2: 28 bit wide XMM registers One register: 4 single precision floating point numbers, or 2 doubles, or 4 32 bit integers, or 2 64 bit integers Single precision and double precision operations (add, multiply, etc...) Bitwise operations Integer operations Logical operations Moving operations AVX: 256 bit wide YMM registers 4 double precision floating point numbers per register 5 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
6 Naive add implementation Dot product Given A and B vectors C := A+B, where c i := a i + b i Requirement: Avoid cancellation errors Minimize the overhead Naive implementation Input: A, B Output: C For each element of A and B: c i := a i + b i 6 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
7 Linear algebraic kernel Dot product Naive implementation: Does not avoid cancellation errors Stabilize the result using relateive tolerance ǫ r : c i := a i + b i if ( a i + b i )ǫ r c i then c i := 0 Operations: 2 additions, multiplication, 2 assignments 3 absolute values jump comparison The result is stable, but the algorithm contains conditional jumps slows down 7 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
8 Dot product Our accelerated stable add method Use Intel s AVX instruction set of the parallel comparisons are in a YMM register These results can be used for bit masking: mask := if ( a i + b i )ǫ r < a i + b i then mask :=... 3 c i := ((a i + b i )) bitwise and with mask The comparison in step 2 is an AVX instruction There is no jumping in the implementation! Absolute value: bit masking (bitwise and) ( a + b ) ε i a i + b i compare a i + bi result i r e-5 4.5e-4 2.e-6 4e-2 3e e-0 4e e e-0-4e e-0 YMM0 YMM YMM2 YMM3 YMM4 8 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
9 Naive dot product implementation Dot product We have two n dimensional vectors: a and b n a T b = a i b i i= Problem: We have to use floating point arithmetic Rounding and cancellation errors 9 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
10 Stable dot product implementation Dot product Separate the negative and positive products Two variables: N: sum of negative products P: sum of positive products Algorithm: Read the a i and b i 2 p := a i b i 3 if p < 0 then 4 N := N + p 5 else 6 P := P + p Final result := N + P 0 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
11 Dot product Our accelerated stable dot product implementation Conditional jumping can be avoided using pointer arithmetic: union Number { double num; unsigned long long int bits; } number; double negpos[2] = {0.0, 0.0}; [...] const double prod = a * b; number.num = prod; *(negpos + (number.bits >> 63)) += prod; The AVX can give more enhancement for the stable dot product / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
12 Hilbert matrix Linear algebraic kernel Hilbert matrix Condition number Large condition number aware logic Hilbert matrix: H n,n, where h i,j = Example: H 4,4 = i+j , i,j =,...,n We can construct the following LP problem: min 0 H n,n x = b x j 0,j =..n, and b j = n i= i + j 2 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
13 Solvers and the Hilbert matrix Hilbert matrix Condition number Large condition number aware logic It is clear that if and only if x j =,j =..n, the solution is optimal We have tested the CLP and the GLPK Size GLPK Exact GLPK CLP 3 3 x j = ± x j = x j = 4 4 x j = ± x j = x j = 5 5 x j = ±.75 0 x j = x j = 6 6 x j = ± x j = INFEASIBLE 7 7 x j = ±.57 x j = INFEASIBLE 8 8 x j = ±.600 x j = ±0.20 INFEASIBLE x j = ±6.298 x j = ±4.24 INFEASIBLE x j x j = ±2.682 INFEASIBLE We have used Clp ang GLPK as libraries, the models were generated and solved by C++ programs. 3 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
14 Condition number Hilbert matrix Condition number Large condition number aware logic Measures, how much the output changes if the input changes κ(b) = B B Problems with computing κ(b): The matrix changes in every iterations If κ(b) is large, computing B is difficult The condition number of the n*n Hilbert matrix is very large, it grows as ( (+ ) 2) 4n O n κ(h 6,6 ) κ(h 0,0 ) κ(h 00,00 ) / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
15 Hilbert matrix Condition number Large condition number aware logic Primary large condition number detector We propose: We can not compute the condition number directly However, we can detect the effect of the large condition number! The input of the classic FTRAN is vector a: B a = α Create the perturbed ā copy of a Use a modified FTRAN, which computes B a = α and B ā = ᾱ The modified FTRAN perturbs every sum during computing ᾱ If r = max{ α, ᾱ } min{ α, ᾱ } is greater than a threshold, it means that the condition number is too large primary alarm 5 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
16 Large condition number aware logic Hilbert matrix Condition number Large condition number aware logic The primary detector is executed An error occurs, for example: fallback to phase- If a primary alarm occurs, the algorithm performs primary detector in the following iterations If primary alarms occur in every next iteration and r does not decrease secondary alarm The algorithm ends If a primary alarm occurs, the algorithm performs a sensitivity analysis If the sensitivity analysis finds that the result is extremely instable secondary alarm If a secondary alarm occurs, the software restarts from the last basis with modified parameters (enabled scaling, switching to LU decomposition, etc.) In the last resort: The software restarts from the last basis with enhanced precision arithmetic 6 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
17 Next steps Linear algebraic kernel Hilbert matrix Condition number Large condition number aware logic We have to integrate the enhanced precision arithmetic to the Pannon Optimizer We have to integrate the large condition number recognizer algorithm The large condition number recognizer can be accelerated with low-level optimization (SIMD architecture) Our goal: Implement a solver which runs fast on the stable problems, but recognizes the excessively numerical instable problems Switches to more precise arithmetic, and solves this problems too 7 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
18 Linear algebraic kernel Stable dot product Primary large condition number detector CPU: Intel Core i5-320m, 2.50 GHz Vector lengths: 0 5 Dot product operations repeated 0 5 times 35,00 30,00 28,83 25,00 time [sec] 20,00 5,00 0,00 0,94 8,82 0,55 5,00 0,00 naive conditional jump SSE2 AVX 8 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
19 Stable dot product Stable dot product Primary large condition number detector CPU: Intel Core i5-320m, 2.50 GHz Vector lengths: 0 6 Dot product operations repeated 0 4 times 70,00 63,35 60,00 50,00 time [sec] 40,00 30,00 20,00 0,00 2,3 2,35 6,76 0,0 0,00 naive conditional jump pointer arithmetic SSE2 AVX 9 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
20 Primary large condition number detector Stable dot product Primary large condition number detector Output of the detector: r = max{ α, ᾱ } min{ α, ᾱ } δ = r Problem 25FV47.MPS STOCFOR3.MPS PILOT.MPS MAROS-R7.MPS Value of δ after the last iteration e e e e-0 Hilbert 7* Hilbert 8* Hilbert 20* Hilbert 26* Hilbert 00* / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
21 Stable dot product Primary large condition number detector Thank you for your attention! This publication/research has been supported by the European Union and Hungary and co-financed by the European Social Fund through the project TÁMOP C-//KONV National Research Center for Development and Market Introduction of Advanced Information and Communication Technologies. 2 / 2 József Smidla, Péter Tar, István Maros Adaptive Stable Additive Methods for Linear Algebraic Calculations
A numerically adaptive implementation of the simplex method
A numerically adaptive implementation of the simplex method József Smidla, Péter Tar, István Maros Department of Computer Science and Systems Technology University of Pannonia 17th of December 2014. 1
More informationDNA Data and Program Representation. Alexandre David 1.2.05 adavid@cs.aau.dk
DNA Data and Program Representation Alexandre David 1.2.05 adavid@cs.aau.dk Introduction Very important to understand how data is represented. operations limits precision Digital logic built on 2-valued
More informationSolution of Linear Systems
Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start
More information7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix
7. LU factorization EE103 (Fall 2011-12) factor-solve method LU factorization solving Ax = b with A nonsingular the inverse of a nonsingular matrix LU factorization algorithm effect of rounding error sparse
More informationThe mathematics of RAID-6
The mathematics of RAID-6 H. Peter Anvin 1 December 2004 RAID-6 supports losing any two drives. The way this is done is by computing two syndromes, generally referred P and Q. 1 A quick
More informationOperation Count; Numerical Linear Algebra
10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point
More informationNumerical Matrix Analysis
Numerical Matrix Analysis Lecture Notes #10 Conditioning and / Peter Blomgren, blomgren.peter@gmail.com Department of Mathematics and Statistics Dynamical Systems Group Computational Sciences Research
More informationFloating-point control in the Intel compiler and libraries or Why doesn t my application always give the expected answer?
Floating-point control in the Intel compiler and libraries or Why doesn t my application always give the expected answer? Software Solutions Group Intel Corporation 2012 *Other brands and names are the
More informationHow To Write A Hexadecimal Program
The mathematics of RAID-6 H. Peter Anvin First version 20 January 2004 Last updated 20 December 2011 RAID-6 supports losing any two drives. syndromes, generally referred P and Q. The way
More informationA Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
More informationHardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
More informationSOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis
More informationSources: On the Web: Slides will be available on:
C programming Introduction The basics of algorithms Structure of a C code, compilation step Constant, variable type, variable scope Expression and operators: assignment, arithmetic operators, comparison,
More informationTHE NAS KERNEL BENCHMARK PROGRAM
THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures
More informationNext Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
More informationNumerical Methods I Eigenvalue Problems
Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)
More informationECE 0142 Computer Organization. Lecture 3 Floating Point Representations
ECE 0142 Computer Organization Lecture 3 Floating Point Representations 1 Floating-point arithmetic We often incur floating-point programming. Floating point greatly simplifies working with large (e.g.,
More informationDivide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1
Divide: Paper & Pencil Computer Architecture ALU Design : Division and Floating Point 1001 Quotient Divisor 1000 1001010 Dividend 1000 10 101 1010 1000 10 (or Modulo result) See how big a number can be
More informationFLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015
FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 AGENDA The Kaveri Accelerated Processing Unit (APU) The Graphics Core Next Architecture and its Floating-Point Arithmetic
More informationMeasures of Error: for exact x and approximation x Absolute error e = x x. Relative error r = (x x )/x.
ERRORS and COMPUTER ARITHMETIC Types of Error in Numerical Calculations Initial Data Errors: from experiment, modeling, computer representation; problem dependent but need to know at beginning of calculation.
More informationA Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation
PLDI 03 A Static Analyzer for Large Safety-Critical Software B. Blanchet, P. Cousot, R. Cousot, J. Feret L. Mauborgne, A. Miné, D. Monniaux,. Rival CNRS École normale supérieure École polytechnique Paris
More informationBinary Number System. 16. Binary Numbers. Base 10 digits: 0 1 2 3 4 5 6 7 8 9. Base 2 digits: 0 1
Binary Number System 1 Base 10 digits: 0 1 2 3 4 5 6 7 8 9 Base 2 digits: 0 1 Recall that in base 10, the digits of a number are just coefficients of powers of the base (10): 417 = 4 * 10 2 + 1 * 10 1
More informationRidgeway Kite Innova've Technology for Reservoir Engineers A Massively Parallel Architecture for Reservoir Simula'on
Innova've Technology for Reservoir Engineers A Massively Parallel Architecture for Reservoir Simula'on Garf Bowen 16 th Dec 2013 Summary Introduce RKS Reservoir Simula@on HPC goals Implementa@on Simple
More informationDecember 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
More informationHSL and its out-of-core solver
HSL and its out-of-core solver Jennifer A. Scott j.a.scott@rl.ac.uk Prague November 2006 p. 1/37 Sparse systems Problem: we wish to solve where A is Ax = b LARGE Informal definition: A is sparse if many
More informationSoftware implementation of Post-Quantum Cryptography
Software implementation of Post-Quantum Cryptography Peter Schwabe Radboud University Nijmegen, The Netherlands October 20, 2013 ASCrypto 2013, Florianópolis, Brazil Part I Optimizing cryptographic software
More informationThis Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers
This Unit: Floating Point Arithmetic CIS 371 Computer Organization and Design Unit 7: Floating Point App App App System software Mem CPU I/O Formats Precision and range IEEE 754 standard Operations Addition
More informationChapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors
Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 05: Array Processors Objective To learn how the array processes in multiple pipelines 2 Array Processor
More informationDEFERRED IMAGE PROCESSING IN INTEL IPP LIBRARY
DEFERRED IMAGE PROCESSING IN INTEL IPP LIBRARY Alexander Kibkalo (alexander.kibkalo@intel.com), Michael Lotkov (michael.lotkov@intel.com), Ignat Rogozhkin (ignat.rogozhkin@intel.com), Alexander Turovets
More informationLecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
More informationBinary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
More informationQuantum Computing and Grover s Algorithm
Quantum Computing and Grover s Algorithm Matthew Hayward January 14, 2015 1 Contents 1 Motivation for Study of Quantum Computing 3 1.1 A Killer App for Quantum Computing.............. 3 2 The Quantum Computer
More informationFaculty of Engineering Student Number:
Philadelphia University Student Name: Faculty of Engineering Student Number: Dept. of Computer Engineering Final Exam, First Semester: 2012/2013 Course Title: Microprocessors Date: 17/01//2013 Course No:
More information64-Bit versus 32-Bit CPUs in Scientific Computing
64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples
More informationEmbedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C
Embedded Systems A Review of ANSI C and Considerations for Embedded C Programming Dr. Jeff Jackson Lecture 2-1 Review of ANSI C Topics Basic features of C C fundamentals Basic data types Expressions Selection
More informationIMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS
Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE
More informationInteger Computation of Image Orthorectification for High Speed Throughput
Integer Computation of Image Orthorectification for High Speed Throughput Paul Sundlie Joseph French Eric Balster Abstract This paper presents an integer-based approach to the orthorectification of aerial
More informationPrecision & Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs
Precision & Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs Nathan Whitehead Alex Fit-Florea ABSTRACT A number of issues related to floating point accuracy and compliance are a frequent
More informationDirect Methods for Solving Linear Systems. Matrix Factorization
Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
More informationLattice QCD Performance. on Multi core Linux Servers
Lattice QCD Performance on Multi core Linux Servers Yang Suli * Department of Physics, Peking University, Beijing, 100871 Abstract At the moment, lattice quantum chromodynamics (lattice QCD) is the most
More informationFactoring Quadratic Expressions
Factoring the trinomial ax 2 + bx + c when a = 1 A trinomial in the form x 2 + bx + c can be factored to equal (x + m)(x + n) when the product of m x n equals c and the sum of m + n equals b. (Note: the
More informationCS3220 Lecture Notes: QR factorization and orthogonal transformations
CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
More informationIntel 64 and IA-32 Architectures Software Developer s Manual
Intel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of seven volumes: Basic Architecture,
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationFaster Set Intersection with SIMD instructions by Reducing Branch Mispredictions
Faster Set Intersection with SIMD instructions by Reducing Branch Mispredictions Hiroshi Inoue, Moriyoshi Ohara, and Kenjiro Taura IBM Research Tokyo, University of Tokyo {inouehrs, ohara}@jp.ibm.com,
More informationChapter One Introduction to Programming
Chapter One Introduction to Programming 1-1 Algorithm and Flowchart Algorithm is a step-by-step procedure for calculation. More precisely, algorithm is an effective method expressed as a finite list of
More informationElemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva robert.geva@intel.com Introduction Intel
More informationFault Tolerant Matrix-Matrix Multiplication: Correcting Soft Errors On-Line.
Fault Tolerant Matrix-Matrix Multiplication: Correcting Soft Errors On-Line Panruo Wu, Chong Ding, Longxiang Chen, Teresa Davies, Christer Karlsson, and Zizhong Chen Colorado School of Mines November 13,
More informationHigh-Performance Modular Multiplication on the Cell Processor
High-Performance Modular Multiplication on the Cell Processor Joppe W. Bos Laboratory for Cryptologic Algorithms EPFL, Lausanne, Switzerland joppe.bos@epfl.ch 1 / 19 Outline Motivation and previous work
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationContrôle dynamique de méthodes d approximation
Contrôle dynamique de méthodes d approximation Fabienne Jézéquel Laboratoire d Informatique de Paris 6 ARINEWS, ENS Lyon, 7-8 mars 2005 F. Jézéquel Dynamical control of approximation methods 7-8 Mar. 2005
More informationFAST INVERSE SQUARE ROOT
FAST INVERSE SQUARE ROOT CHRIS LOMONT Abstract. Computing reciprocal square roots is necessary in many applications, such as vector normalization in video games. Often, some loss of precision is acceptable
More informationFast Exponential Computation on SIMD Architectures
, HiPEAC15 - WAPCO, Amsterdam (NL) Fast Exponential Computation on SIMD Architectures A. Cristiano I. MALOSSI, Yves INEICHEN, Costas BEKAS and Alessandro CURIONI @ IBM Research Zurich, Switzerland Motivations
More informationIntroduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software
GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationPhys4051: C Lecture 2 & 3. Comment Statements. C Data Types. Functions (Review) Comment Statements Variables & Operators Branching Instructions
Phys4051: C Lecture 2 & 3 Functions (Review) Comment Statements Variables & Operators Branching Instructions Comment Statements! Method 1: /* */! Method 2: // /* Single Line */ //Single Line /* This comment
More informationLecture 3. Optimising OpenCL performance
Lecture 3 Optimising OpenCL performance Based on material by Benedict Gaster and Lee Howes (AMD), Tim Mattson (Intel) and several others. - Page 1 Agenda Heterogeneous computing and the origins of OpenCL
More informationEvaluation of CUDA Fortran for the CFD code Strukti
Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center
More informationIntro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
More informationNotes on Factoring. MA 206 Kurt Bryan
The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor
More information7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
More informationPexip Speeds Videoconferencing with Intel Parallel Studio XE
1 Pexip Speeds Videoconferencing with Intel Parallel Studio XE by Stephen Blair-Chappell, Technical Consulting Engineer, Intel Over the last 18 months, Pexip s software engineers have been optimizing Pexip
More informationFMA Implementations of the Compensated Horner Scheme
Reliable Implementation of Real Number Algorithms: Theory and Practice Dagstuhl Seminar 06021 January 08-13, 2006 FMA Implementations of the Compensated Horner Scheme Stef GRAILLAT, Philippe LANGLOIS and
More informationIMAGE SIGNAL PROCESSING PERFORMANCE ON 2 ND GENERATION INTEL CORE MICROARCHITECTURE PRESENTATION PETER CARLSTON, EMBEDDED & COMMUNICATIONS GROUP
IMAGE SIGNAL PROCESSING PERFORMANCE ON 2 ND GENERATION INTEL CORE MICROARCHITECTURE PRESENTATION PETER CARLSTON, EMBEDDED & COMMUNICATIONS GROUP Q3 2011 325877-001 1 Legal Notices and Disclaimers INFORMATION
More informationA new binary floating-point division algorithm and its software implementation on the ST231 processor
19th IEEE Symposium on Computer Arithmetic (ARITH 19) Portland, Oregon, USA, June 8-10, 2009 A new binary floating-point division algorithm and its software implementation on the ST231 processor Claude-Pierre
More informationBinary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit
Decimal Division Remember 4th grade long division? 43 // quotient 12 521 // divisor dividend -480 41-36 5 // remainder Shift divisor left (multiply by 10) until MSB lines up with dividend s Repeat until
More informationGeneral Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
More informationImplementation of Canny Edge Detector of color images on CELL/B.E. Architecture.
Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Chirag Gupta,Sumod Mohan K cgupta@clemson.edu, sumodm@clemson.edu Abstract In this project we propose a method to improve
More informationStudy of a neural network-based system for stability augmentation of an airplane
Study of a neural network-based system for stability augmentation of an airplane Author: Roger Isanta Navarro Annex 3 ANFIS Network Development Supervisors: Oriol Lizandra Dalmases Fatiha Nejjari Akhi-Elarab
More information5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1
5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition
More informationDetermining the Optimal Combination of Trial Division and Fermat s Factorization Method
Determining the Optimal Combination of Trial Division and Fermat s Factorization Method Joseph C. Woodson Home School P. O. Box 55005 Tulsa, OK 74155 Abstract The process of finding the prime factorization
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
More informationGPU Hardware Performance. Fall 2015
Fall 2015 Atomic operations performs read-modify-write operations on shared or global memory no interference with other threads for 32-bit and 64-bit integers (c. c. 1.2), float addition (c. c. 2.0) using
More informationFloating Point Fused Add-Subtract and Fused Dot-Product Units
Floating Point Fused Add-Subtract and Fused Dot-Product Units S. Kishor [1], S. P. Prakash [2] PG Scholar (VLSI DESIGN), Department of ECE Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu,
More informationHigh-speed image processing algorithms using MMX hardware
High-speed image processing algorithms using MMX hardware J. W. V. Miller and J. Wood The University of Michigan-Dearborn ABSTRACT Low-cost PC-based machine vision systems have become more common due to
More informationOpenMP and Performance
Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University Tuning Cycle Performance Tuning aims to improve the runtime of an
More information1 The Brownian bridge construction
The Brownian bridge construction The Brownian bridge construction is a way to build a Brownian motion path by successively adding finer scale detail. This construction leads to a relatively easy proof
More informationChapter 7D The Java Virtual Machine
This sub chapter discusses another architecture, that of the JVM (Java Virtual Machine). In general, a VM (Virtual Machine) is a hypothetical machine (implemented in either hardware or software) that directly
More information3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationSmartArrays and Java Frequently Asked Questions
SmartArrays and Java Frequently Asked Questions What are SmartArrays? A SmartArray is an intelligent multidimensional array of data. Intelligent means that it has built-in knowledge of how to perform operations
More informationIntroducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
More informationIntegrating Benders decomposition within Constraint Programming
Integrating Benders decomposition within Constraint Programming Hadrien Cambazard, Narendra Jussien email: {hcambaza,jussien}@emn.fr École des Mines de Nantes, LINA CNRS FRE 2729 4 rue Alfred Kastler BP
More informationLearn CUDA in an Afternoon: Hands-on Practical Exercises
Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA
More informationCSE 6040 Computing for Data Analytics: Methods and Tools
CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 12 Computer Architecture Overview and Why it Matters DA KUANG, POLO CHAU GEORGIA TECH FALL 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS
More informationApplications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
More informationIntro to scientific programming (with Python) Pietro Berkes, Brandeis University
Intro to scientific programming (with Python) Pietro Berkes, Brandeis University Next 4 lessons: Outline Scientific programming: best practices Classical learning (Hoepfield network) Probabilistic learning
More informationA 0.9 0.9. Figure A: Maximum circle of compatibility for position A, related to B and C
MEASURING IN WEIGHTED ENVIRONMENTS (Moving from Metric to Order Topology) Claudio Garuti Fulcrum Ingenieria Ltda. claudiogaruti@fulcrum.cl Abstract: This article addresses the problem of measuring closeness
More informationA Constraint Programming based Column Generation Approach to Nurse Rostering Problems
Abstract A Constraint Programming based Column Generation Approach to Nurse Rostering Problems Fang He and Rong Qu The Automated Scheduling, Optimisation and Planning (ASAP) Group School of Computer Science,
More informationGPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
More informationHaswell Cryptographic Performance
White Paper Sean Gulley Vinodh Gopal IA Architects Intel Corporation Haswell Cryptographic Performance July 2013 329282-001 Executive Summary The new Haswell microarchitecture featured in the 4 th generation
More informationUsing EXCEL Solver October, 2000
Using EXCEL Solver October, 2000 2 The Solver option in EXCEL may be used to solve linear and nonlinear optimization problems. Integer restrictions may be placed on the decision variables. Solver may be
More informationGPU Accelerated Monte Carlo Simulations and Time Series Analysis
GPU Accelerated Monte Carlo Simulations and Time Series Analysis Institute of Physics, Johannes Gutenberg-University of Mainz Center for Polymer Studies, Department of Physics, Boston University Artemis
More informationLinear Programming. March 14, 2014
Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1
More information1. Convert the following base 10 numbers into 8-bit 2 s complement notation 0, -1, -12
C5 Solutions 1. Convert the following base 10 numbers into 8-bit 2 s complement notation 0, -1, -12 To Compute 0 0 = 00000000 To Compute 1 Step 1. Convert 1 to binary 00000001 Step 2. Flip the bits 11111110
More informationSolving Linear Systems of Equations. Gerald Recktenwald Portland State University Mechanical Engineering Department gerry@me.pdx.
Solving Linear Systems of Equations Gerald Recktenwald Portland State University Mechanical Engineering Department gerry@me.pdx.edu These slides are a supplement to the book Numerical Methods with Matlab:
More informationVirtual Landmarks for the Internet
Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser
More informationANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING
ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING Sonam Mahajan 1 and Maninder Singh 2 1 Department of Computer Science Engineering, Thapar University, Patiala, India 2 Department of Computer Science Engineering,
More information