An Adaptable Fast Matrix Multiplication (AFMM) Algorithm: Going Beyond the Myth of Decimal-War

Similar documents
Lecture 3: Finding integer solutions to systems of linear equations

Integer Factorization using the Quadratic Sieve

Solution of Linear Systems

International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.1, No.3,August 2013

Continued Fractions and the Euclidean Algorithm

Load Distribution on a Linux Cluster using Load Balancing

Parallel Algorithm for Dense Matrix Multiplication

SOLVING LINEAR SYSTEMS

8 Square matrices continued: Determinants

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

Integrating Benders decomposition within Constraint Programming

1.2 Solving a System of Linear Equations

LOGNORMAL MODEL FOR STOCK PRICES

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

On the Traffic Capacity of Cellular Data Networks. 1 Introduction. T. Bonald 1,2, A. Proutière 1,2

1 Solving LPs: The Simplex Algorithm of George Dantzig

Faster deterministic integer factorisation

6.1. The Exponential Function. Introduction. Prerequisites. Learning Outcomes. Learning Style

SECTION 10-2 Mathematical Induction

7 Gaussian Elimination and LU Factorization

An Empirical Study of Two MIS Algorithms

Improved Single and Multiple Approximate String Matching

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

The Graphical Method: An Example

Direct Methods for Solving Linear Systems. Matrix Factorization

MULTIPLE-OBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

GRID SEARCHING Novel way of Searching 2D Array

CHAPTER 1 INTRODUCTION

The Goldberg Rao Algorithm for the Maximum Flow Problem

M. Sugumaran / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (3), 2011,

160 CHAPTER 4. VECTOR SPACES

Cryptography and Network Security Department of Computer Science and Engineering Indian Institute of Technology Kharagpur

Useful Number Systems

International Journal of Computer Engineering and Applications, Volume V, Issue III, March 14

Factoring Algorithms

Why? A central concept in Computer Science. Algorithms are ubiquitous.

A Labeling Algorithm for the Maximum-Flow Network Problem

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

Arithmetic algorithms for cryptology 5 October 2015, Paris. Sieves. Razvan Barbulescu CNRS and IMJ-PRG. R. Barbulescu Sieves 0 / 28

Indiana State Core Curriculum Standards updated 2009 Algebra I

Department of Economics

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

SPARE PARTS INVENTORY SYSTEMS UNDER AN INCREASING FAILURE RATE DEMAND INTERVAL DISTRIBUTION

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).

Chapter 8: Bags and Sets

Keywords Quantum logic gates, Quantum computing, Logic gate, Quantum computer

IN current film media, the increase in areal density has

CS/COE

Network (Tree) Topology Inference Based on Prüfer Sequence

Chapter 6: Sensitivity Analysis

Symbolic Determinants: Calculating the Degree

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Algebraic Computation Models. Algebraic Computation Models

3. Mathematical Induction

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

The Taxman Game. Robert K. Moniot September 5, 2003

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Risk Management for IT Security: When Theory Meets Practice

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

171:290 Model Selection Lecture II: The Akaike Information Criterion

OPRE 6201 : 2. Simplex Method

Minesweeper as a Constraint Satisfaction Problem

Matrices 2. Solving Square Systems of Linear Equations; Inverse Matrices

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

On the Interaction and Competition among Internet Service Providers

Creating a NL Texas Hold em Bot

Factoring Algorithms

Statistical Machine Learning

A Model-driven Approach to Predictive Non Functional Analysis of Component-based Systems

Integer Operations. Overview. Grade 7 Mathematics, Quarter 1, Unit 1.1. Number of Instructional Days: 15 (1 day = 45 minutes) Essential Questions

Regular Expressions and Automata using Haskell

1 Teaching notes on GMM 1.

Section 3-3 Approximating Real Zeros of Polynomials

FCE: A Fast Content Expression for Server-based Computing

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

OLAP Data Scalability

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Data Structures and Algorithms

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

A Review of Sudoku Solving using Patterns

Zachary Monaco Georgia College Olympic Coloring: Go For The Gold

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

A three point formula for finding roots of equations by the method of least squares

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

Linear Codes. Chapter Basics

Security in Outsourcing of Association Rule Mining

Software Engineering. Introduction. Software Costs. Software is Expensive [Boehm] ... Columbus set sail for India. He ended up in the Bahamas...

Matrix Multiplication

Factoring & Primality

YOKING OBJECT ORIENTED METRICS THROUGH MUTATION TESTING FOR MINIMIZING TIME PERIOD RAMIFICATION

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the school year.

4.5 Linear Dependence and Linear Independence

Year 9 set 1 Mathematics notes, to accompany the 9H book.

CONTINUED FRACTIONS AND FACTORING. Niels Lauritzen

Kapitel 1 Multiplication of Long Integers (Faster than Long Multiplication)

5.1 Bipartite Matching

Transcription:

An Adaptable Fast Matrix Multiplication (AFMM) Algorithm: Going Beyond the Myth of Decimal-War Niraj Kumar Singh 1, Soubhik Chakraborty 2* and Dheeresh Kumar Mallick 1 1 Department of Computer Science & Engineering, B.I.T. Mesra, Ranchi-835215, India 2 Department of Applied Mathematics, B.I.T. Mesra, Ranchi-835215, India *email address of the corresponding author: soubhikc@yahoo.co.in (S. Chakraborty) Abstract: In this paper we present an adaptable fast matrix multiplication (AFMM) algorithm, for two nxn dense matrices which computes the product matrix with average complexity T avg (n) = μ d 1 d 2 n 3 with the acknowledgement that the average count is obtained for addition as the basic operation rather than multiplication which is probably the unquestionable choice for basic operation in existing matrix multiplication algorithms. Here d 1 and d 2 are the densities (fraction of non-zero elements) of the pre and the post factor matrices only and μ is the expected value of the non zero elements of the post factor matrix. Remembering the fact that a single addition operation is much cheaper (however, this factor may differ from one machine to another) than a single multiplication operation, our algorithm finds the product matrix without using a single multiplication operation. The replacement of multiplications by additions has several significant and interesting aspects as it adds a non-determinism even to a problem which otherwise is considered to be deterministic! It can be argued that for inputs trivial as well as non trivial, AFMM algorithm can beat Strassen s algorithm for matrix multiplication. Keywords: Dense matrices; Deterministic matrix multiplication algorithm; Non-determinism; Adaptive matrix multiplication algorithm; Statistical bound estimate 1. Introduction: Not everything that can be counted counts and not everything that counts can be counted. Albert Einstein (1879-1955) The classical version along with its major variants, where fixing the instance size (n) fixes all the computing operations is deterministic as it produces the same output when it is re-run for the identical inputs. In this paper we present an adaptable matrix multiplication algorithm, for two nxn dense matrices. Our algorithm computes the resultant matrix in average time Ο(μ d 1 d 2 n 3 ) when analyzed mathematically. This discussion is further followed by its parameterized analysis as well. Here d 1 and d 2 are the densities (fraction of non-zero elements) of the pre and the post factor matrices only. Since the time of Strassen s discovery [1] in 1969, several other algorithms for multiplying two nxn matrices of real numbers in O(n α ) time with progressively smaller constants α have been invented. The fastest algorithm so far is that of Virginia Williams [2], with its efficiency in O(n 2.3727 ). The decreasing values of the exponents have been obtained at the expense of increasing complexity of these algorithms. Because of large multiplicative constants, none of them is of practical use [3].

The deterministic nature of response remained unchanged even when we switched to more sophisticated algorithms for this problem. The only positive aspect in the whole development is interesting only from theoretical point of view. But when it comes to practice, especially for average case analysis, it is the actual machine time which matters more than the theoretical predictions based on the assumption of pivotal,i.e., dominant operation(s) present in the code. Irrespective of their sophistication, a major commonality among the present approaches is their parameter independence nature. That is, in terms of number of basic operations performed, they all are almost unaffected of the value of inputs. This article is a major breakthrough in this direction due to its adaptive nature to the input values present in the matrices. Remembering that a single addition operation is much cheaper (however, it may differ from one machine to another) than a single multiplication operation we have introduced an algorithm which solves this problem without using even a single multiplication operation. The replacement of multiplications by additions has several significant and interesting aspects as it adds nondeterminism even to a problem which otherwise is considered to be deterministic [4]. With suitable modifications even the classical matrix multiplication algorithm qualifies to be a potential candidate for parameterized complexity analysis. 2. Adaptable Matrix Multiplication (AFMM) Algorithm Below we present an adaptable matrix multiplication (AFMM) algorithm for the multiplication of two dense square matrices. Let X and Y be pre and post factor square matrices of size nxn with densities d 1 and d 2 respectively. Here Z is the resultant (product) matrix initialized with all zeroes. In case-a, the pre factor matrix X consists of real number values generated randomly using some probability distribution. Whereas, matrix Y consists of integer values whose expected value is assumed as μ. In case-b, the values inside the pre and post factor matrices swap their properties. As the method outlined below deals with multiplication of two dense matrices we recommend adjacency matrix as data structure for storing matrix data. Case-A: REAL X, Z, Base INTEGER Y, I, J, K, Rep-factor, p 1.WHILE I 1 to N DO 2..WHILE K 1 to N DO 3.. Base X [I, K] 4. IF (Base EQUALS 0) 5. Redirect control to the beginning of K loop 6.. WHILE J 1 to N DO 7.. Rep-factor Y [K, J] 8.. Z [I, J] Z [I, J] + // μ = floor (μ) Case-B:

REAL Y, Z, Base INTEGER X, I, J, K, Rep-factor, p 1.WHILE I 1 to N DO 2..WHILE K 1 to N DO 3.. Rep-factor X [I, K] 4.. WHILE J 1 to N DO 5.. Base Y [K, J] 6...IF (Base EQUALS 0) 7.. Redirect control to the beginning of K loop 8.. Z [I, J] Z [I, J] + // μ = floor (μ) 2.1 Proof of Correctness We present the following arguments in favor of the correctness of AFMM algorithm. Point (1): AFMM computes correct result (Case-A is discussed). The computation is done in row-major fashion. A selected X [I, K] (base) element is processed Y [K, J] number of times, and this activity is repeated n times for J=1 to n. This individual activity is basically summation of the base element Y [K, J] number of times. Once J values exhaust, the inner most loop (J loop) is terminated and the whole process is iterated for incremented value of K. Following these lines, upon exhausting the K values, we have final result for I th row of the resultant matrix Z [I, J]. Whole of the above process is repeated for next I values (2 to n) as well and finally when all the I s get exhausted we get the final result in the form of matrix Z [I, J]. Also, as the rep-factor value is assumed to be an integer, in statement-8 the upper value to summation is rightly justified. Point (2): AFMM stops eventually. As the depth of each of the three loops is bounded from above by the dimension of pre and post factor matrices it always eventually terminates. 2.2 Theoretical Analysis (Counting the expected number of additions) Since addition is the only key operation identified in the algorithm, we obtain the theoretical complexity in terms of expected number of additions. The expected number of additions E(A), is a function of both the dimension of the two matrices as well as the expected value of elements (denoted as E(X) = μ) inside the post (pre in case of case-b) factor matrix Y (X). Let μ be the expected value of non-zero elements in the post factor matrix Y (Case A) and which is known to the experimenter in advance and μ be the mean over the entire post factor

matrix elements. A little calculation shows that μ equals to μ d 2. The expected number of addition E(A) = n{n 2 d 1 }μ. Here the factor n 2 d 1 corresponds to each of the non-zero elements of the pre factor matrix and the factor n corresponds to the frequency of each of such elements contributing to the final result. Substituting μ d 2 for μ we get E(A) = μ d 1 d 2 n 3, which is the expected number of additions. Trivially, when product of the two density factors is kept fixed at n -1 we obtain a quadratic run time, provided the factor μ is reasonable! 2.3 Empirical Analysis through Statistical Bound Estimate (Empirical-O) This section includes empirical results. The observed mean time (in second(s)) was noted in table (1). Average case analysis was done by directly working on program run time which can be used in estimating the weight based statistical bound over a finite range by running computer experiments [5], [6]. This estimate is called empirical-o [4], [7]. Here time of an operation is taken as its weight. Weighing permits collective consideration of all operations into a conceptual bound which we call a statistical bound in order to distinguish it from the count based mathematical bounds that are operation specific. The credibility of empirical-o depends on the design and analysis of our special computer experiment in which time was response. See [4] for more insight. System specification: All the computer experiments were carried out using PENTIUM 1600 MHz processor and 512 MB RAM. 2.3.1 Parameterized Complexity Analysis This section does a systematic measurement of our proposed algorithm over differing parameter values. Through our experimentation we have re-verified the parameter independence nature of standard ijk and ikj algorithms [8]. However, the replacement of multiplications by additions has induced non determinism to AFMM algorithm. And hence in this respect it demands for a comprehensive parameterized analysis for this algorithm. The observed mean time over sufficient number of readings for each specified n value is recorded as is given in table (1). The parameters d 1 and d 2 are 1/3 and ½ respectively for μ = 1 through 7, and 1/5 and 2/5 respectively for μ equal to 14 and 21. These data essentially correspond to case (A) of the proposed algorithm. Table (1): observed mean time in second(s) ijk ikj AFMM AFMM AFMM AFMM AFMM AFMM N μ' =1 μ' =3 μ' =5 μ' =7 μ' =14 μ' =21 250 0.411667 0.328 0.1752 0.2784 0.3316 0.3498 0.3188 0.2874 500 2.718667 2.32 0.918 1.5406 1.943 2.331 1.9122 1.9874 750 10.235 7.75533 2.8475 4.9718 6.38 7.6826 6.3218 6.4328 1000 25.125 18.43233 6.67175 11.8064 15.073 18.2756 14.9158 15.3374 1250 49.437 35.75 13.0897 23.0705 29.489 35.594 29.172 29.9267 1500 85.531 61.688 22.5625 40.078 50.9295 61.718 50.3515 52.2 1750 135.906 98.515 36.0235 63.2655 80.6235 97.6955 80.281 81.9335 2000 205.422 147.078 53.375 94.6875 120.8515 145.8985 119.516 122.491

Fig (1) Plot for AFMM algorithm on various specified parameters. The plot for data in table (1) is given in fig (1). Here treat constant k as μ which is the mean of non-zero elements in matrix Y. Assuming the run time measurement for ikj algorithm as the baseline, it is observed that for the specified parameter values the run time of AFMM algorithm is reduced by around 64 percent when the expected value is 1. It is reduced by 19 and 17 percent when the expected value is 14 and 21 respectively. 3 Conclusions Amongst the algorithms for multiplying two matrices, Strassen s algorithm has an edge over others for sufficiently large (practically feasible) matrices [3]. Still with suitable constraints (while maintaining both the matrices dense), as mentioned in the main article, we can always generate inputs for which AFMM beats Strassen s. Trivially by keeping the product d 1 d 2 at 1/n we can hope for a quadratic performance from AFMM which is never expected from that of Strassen s irrespective of the type of input. Even for non-trivial inputs with sufficiently low values of d 1 d 2 product we expect better performance from AFMM. Fortunately the assumptions made in this paper regarding the values in either of pre/post factor matrices can be generalized for arbitrary valued dense matrices. Our work towards this generalization is in progress and we are hopeful of getting some exciting results out of it.

References 1. Strassen, V., Gaussian Elimination is not Optimal. Numerische Mathematik, vol. 13, 1969, 354-356. 2. Williams V. V., Multiplying Matrices faster than Coopersmith-Winograd, STOC '12 Proceedings of the 44th symposium on Theory of Computing, pages 887-898, ACM New York, NY, USA 2012 3. Levitin, A., Introduction to the Design & Analysis of Algorithms, Pearson Education, Second Edition, 2008 4. Chakraborty, S., Sourabh, S. K.: A Computer Experiment Oriented Approach to Algorithmic Complexity, Lambert Academic Publishing, (2010) 5. Sacks, J., Weltch, W., Mitchel, T., Wynn, H.: Design and Analysis of Computer Experiments, Statistical Science 4 (4) (1989) 6. Fang, K. T., Li, R., Sudjianto, A.: Design and Modeling of Computer Experiments Chapman and Hall (2006) 7. Sourabh, S.K., Chakraborty, S.: On why an algorithmic time complexity measure can be system invariant rather than system independent, Applied Mathematics and Computation, Vol. 190, issue 1, 195--204 (2007) 8. Sahni, S., Data Structures, Algorithms and Applications in JAVA, University Press, Second Edition, pp. 145-149 (2005)