Fast construction of k-nearest Neighbor Graphs for Point Clouds

Save this PDF as:

Size: px
Start display at page:

Transcription

1 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, SEPTEMBER 9 1 Fast construction of k-nearest Neighbor Graphs for Point Clouds Michael Connor, Piyush Kumar Abstract We present a parallel algorithm for k-nearest neighbor graph construction that uses Morton ordering. Experiments show that our approach has the following advantages over existing methods: 1) Faster construction of k-nearest neighbor graphs in practice on multi-core machines. ) Less space usage. 3) Better cache efficiency. 4) Ability to handle large data sets. 5) Ease of parallelization and implementation. If the point set has a bounded expansion constant, our algorithm requires one comparison based parallel sort of points according to Morton order plus near linear additional steps to output the k-nearest neighbor graph. Index Terms Nearest neighbor searching, point based graphics, k-nearest neighbor graphics, Morton Ordering, parallel algorithms. 1 INTRODUCTION This paper presents the design and implementation of a simple, fine-grain parallel and cache-efficient algorithm to solve the k-nearest neighbor graph computation problem for cache-coherent shared memory multi-processors. Given a point set of size n it uses only On) space. We show that with a bounded expansion constant γ as described in [15]), using p threads assuming p cores are available for computation) we can compute the k-nng in O n p k log k) unit steps, plus one parallel sort on the input. We also present extensive experimental results on a variety of architectures. The k-nearest neighbor graph problem is defined as follows: given a point cloud P of n points in R d and a positive integer k n 1, compute the k-nearest neighbors of each point in P. More formally, let P = {p 1, p,..., p n } be a point cloud in R d where d 3. For each p i P, let Ni k be the k points in P, closest to p i. The k-nearest neighbor graph k-nng) is a graph with vertex set {p 1, p,..., p n } and edge set E = {p i, p j ) : p i Nj k or p j Ni k }. The well known all-nearest-neighbor problem corresponds to the k = 1 case. For the purpose of this paper we are constraining ourselves to Euclidean distance, as well as low dimensions. The problem of computing k-nearest neighbor graph computation arises in many applications and areas including computer graphics, visualization, pattern recognition, computational geometry and geographic information systems. In graphics and visualization, computation of k-nng forms a basic building block in solving many important problems including normal estimation [18], surface simplification [4], finite element modeling [8], Michael Connor and Piyush Kumar are with the Department of Computer Science, Florida State University, Tallahassee, FL 336. This research was funded by NSF through CAREER Award CCF Software available at : stann. Manuscript received Oct 19, 8; revised January 11, 7. shape modeling [5], watermarking [11], virtual walkthroughs [7] and surface reconstruction [1], [13]. With the growing sizes of point clouds, the emergence of multi-core processors in mainstream computing and the increasing disparity between processor and memory speed; it is only natural to ask if one can gain from the use of parallelism for the k-nng construction problem. The naive approach to solve the k-nng construction problem uses On ) time and Onk) space. Theoretically, the k-nng can be computed in On log n + nk) [4]. The method is not only theoretically optimal and elegant but also parallelizable. Unfortunately, in practice, most practitioners choose to use variants of kd-tree implementations [18], [3], [8] because of the high constants involved in theoretical algorithms [9], [4], [9], [1]. In low dimensions, one of the best kd-tree implementations is by Arya et al. []. Their kd-tree implementation is very carefully optimized both for memory access and speed, and hence has been the choice of practitioners for many years to solve the k-nng problem in point based graphics [18], [5], [8]. In our experiments we use this implementation as a basis of comparison, and results indicate that for k-nng construction our algorithm has a distinct advantage. Our method uses Morton order or Z-order of points, a space filling curve that has been used previously for many related problems. Tropf and Herzog [7] present a precursor to many nearest neighbor algorithms. Their method uses one, unshifted Morton order point set to conduct range queries. The main drawbacks of their method were: 1) It does not allow use of non-integer keys. ) It does not offer a rigourous proof of worst case or expected run time, in fact it leaves these as an open problem. 3) It offers no scheme to easily parallelize their algorithm. Orenstein and Merrett [] described another data structures for range searching using Morton order on integer keys [19]. Bern [3] described an algorithm using d shifted copies of a point set in Morton order to compute an approximate solution to the k-nearest

2 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, SEPTEMBER 9 neighbor problem. This paper avoids a case where two points lying close together in space are far away in the one-dimensional order, in expectation. In all these algorithms, the Morton order was determined using explicit interleaving of the coordinate bits. Liao et al. [16] used Hilbert curves to compute an approximate nearest neighbor solution using only d + 1 shifted curves. Chan [5] refined this approach, and later presented an algorithm that used only one copy of the data, while still guaranteeing a correct approximation result for the nearest neighbor problem [6]. It is worth noting that in practice, these methods for k-nearest neighbor computation are less efficient than state of the art kd-tree based algorithms [6]. However, in this paper, we show that a single shift based algorithm, that is also exact, cache friendly and multi-core aware, can compete with a kd-tree algorithm in both theory and practice. Recently, the k-nng construction problem has also been studied in the external memory setting or using the disk access model [6]. The design and implementation of our algorithm is more tuned toward the cacheoblivious model [14] and hence differs significantly from [6]. While our algorithm was not specifically designed for external memory, experiments have shown that through the use of large amounts of swap space it can handle very large data sets. Parallel to this work, the same group has announced a GPU based k-nng algorithm [17]. The paper is mostly concentrated on solving the similarity join problem efficiently, which is a different problem than the k-nng problem. We believe that our rigorous analysis on the running time for getting exact answers, as well as the use of floating point coordinates in Morton ordering Algorithm 1) could help the GPU implementation of similarity joins. Among the various multi-processor/core architectures that have been proposed in recent times, a new tightly coupled multi-core multi-processor architecture is gaining a central place in high performance computing. These new architectures have a shared address programming model with physically distributed memory and coherent replication either in caches or main memory). These architectures usually comprise of one or more multi-core or chip multi-processor CPUs, and are emerging as the dominant computing platform. This class of systems differs a lot from the traditional message passing machines, clusters or vector supercomputers. Our algorithm mainly consists of the following three high level components: Preprocessing Phase: In this step, we sort the input points P using Morton ordering a space filling curve). Sliding Window Phase: For each point p in the sorted array of points, we compute its approximate k-nearest neighbors by scanning Ok) points to the left and right of p. Another way to think of this step is, to slide a window of length Ok) on the sorted array and find the k-nearest neighbors restricted to this window. Search Phase: We refine the answers of the last phase by zooming inside the constant factor approximate k-nearest neighbor balls using properties of the Morton order. The first phase is implemented using parallel Quick Sort [8]. In section we describe Morton ordering and a parallel implementation of the final two phases. Section also presents the analysis of the running time of our algorithm. Section 3 describes the experimental setup we use. Section 4 presents our experimental results. Section 5 concludes the paper. METHODOLOGY In this section we describe our algorithm in detail. Before we start the description, we need to describe the Morton order on which our algorithm is based..1 Morton Ordering Morton order or Z-order is a space filling curve with good locality preserving behavior. It is often used in data structures for mapping multi-dimensional data to one dimension. The Z-value of a point in multiple dimensions can be calculated by interleaving the binary representations of its coordinate values. Our preprocessing phase consists of sorting the input data using their Z- values without explicitly computing the Z-value itself. The Morton order curve can be conceptually achieved q Fig. 1: The Morton order curve preceding the upper left corner, and following the lower right corner of a box, will never intersect the box. by recursively dividing a d-dimensional cube into d cubes, then ordering those cubes, until at most 1 point resides in each cube. In and 3 dimensions, the resulting p

3 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, SEPTEMBER 9 3 Algorithm 1 Floating Point Morton Order Algorithm Require: d-dimensional points p and q Ensure: true if p < q in Morton order q 1: procedure COMPAREpoint p, point q) : x ; dim 3: for all j = to d do 4: y XORMSBp j), q j) ) 5: if x < y then 6: x y; dim j 7: end if 8: end for 9: return p dim) < q dim) 1: end procedure p Fig. : The smallest quadtree box containing two points will also contain all points lying between the two in Morton order. 11: procedure XORMSBdouble a, double b) 1: x EXPONENTa); y EXPONENTb) 13: if x = y then 14: z MSDBMANTISSAa),MANTISSAb)) 15: x x z 16: return x 17: end if 18: if y < x then return x 19: else return y : end procedure trees are sometimes referred to as quadtrees and octrees respectively. Although, we use quadtrees to explain our algorithm, the algorithm itself extends easily to higher dimensions. Two simple properties of Morton order, shown in Figure 1 and Figure, will be used to prune points in our k-nng construction algorithm. Chan [6] showed that the relative Morton order of two integer points can be easily calculated, by determining which pair of coordinates have the first differing bit in binary notation in the largest place. He further showed that this can be accomplished using a few binary operations. Our preliminary experiments with sorting points in Z-order showed that Chan s trick was faster than explicitly interleaving the binary representation of the coordinate values. While Chan s method only applies to integer types, it can be extended to floating point types as shown in Algorithm 1. The algorithm takes two points with floating point coordinates. The relative order of the two points is determined by the pair of coordinates who have the first differing bit with the highest exponent. The XORMSB function computes this by first comparing the exponents of the coordinates, then comparing the bits in the mantissa, if the exponents are equal. Note that the MSDB function on line 14 returns the most significant differing bit of two integer arguments. This is calculated by first XORing the two values, then shifting until we reach the most significant bit. The EXPONENT and MANTISSA functions return those parts of the floating point number in integer format. This algorithm allows the relative Morton order of points with floating point coordinates to be found in Od) time and space complexity.. Notation In general, lower case Roman letters denote scalar values. p and q are specifically reserved to refer to points. P is reserved to refer to a point set. n is reserved to refer to the number of points in P. We write p < q iff the Z-value of p is less than q > is used similarly). We use p s to denote the shifted point p + s, s,..., s). P s = {p s p P }. distp, q) denotes the Euclidean distance between p and q. p i is the i-th point in the sorted Morton ordering of the point set. We use p j) to denote the j-th coordinate of the point p. The Morton ordering also defines a quadtree on the point cloud. box Q p i, p j ) refers to the smallest quadtree box that contains the points p i and p j. We use boxc, r) to denote a box with center c and radius r. The radius of a box is the radius of the inscribed sphere of the box. k is reserved to refer to the number of nearest neighbors to be found for every point. d is reserved to refer to the dimension. In general, upper case Roman letters B) refer to a bounding box. Bounding boxes with a subscript Q B Q ) refer to a quadtree box. and distp, B) is defined as the minimum distance from point p to box or quadtree box) B. E[ ] is reserved to refer to an expected value. E refers to an event. P E) is reserved to refer to the probability of an event E. A i is reserved to refer to the current k nearest neighbor solution for point p i, which may still need to be refined to find the exact nearest neighbors. nn k p, {}) defines a function that returns the k nearest neighbors to p from a set. The bounding box of A i refers

6 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, SEPTEMBER 9 6 a) b) Fig. 4: a) B lands cleanly on the quadtree box B Q twice its size. b) B lands cleanly on a quadtree box times its size. In both figures, if the upper left corner of B lies in the shaded area, the box B does not intersect the boundary of B Q. Obviously, a) happens with probability 1 4 probability 1 )d in general dimension) and b) happens with probability 1 ) = 9 16 probability 1 ) d in general dimension). B Q is j 1)/ j ) d. If B Q is the smallest quadtree box housing B, then all quadtree boxes with side lengths h+1, h+,..., h+j 1 cannot contain B Q. This probability is given by: j 1 l ) d ) 1 1 = l j 1 l ) d l 1) d l ) d The probability of B Q containing B is therefore P E j ) = j 1) d j ) d j 1 l ) d l 1) d l ) d We now use the following inequality: Given v such that < v < 1; 1 v) d 1 dv + dd 1) v!, which can be easily proved using induction or alternating series estimation theorem. Putting v = 1/ l we get: 1 v) d 1 dv1 + v/) + d v / 1 d l 1 + 1/ l+1 ) + d / l+1 To simplify the sum, we will use a Taylor series with v = 1 l. 1 v) d 1 dv + dd 1) v dd 1)d )v3! 3! dv + d d) v! 1 dv + d v dv 1 dv 1 + v 1 d l l+1 ) + d v ) + d l+1 Fig. 5: All boxes referred in Lemma.3. Then, by substituting and simplifying P E j ) 1 1 ) d j 1 [ d j l ) d j 1 j d l 1 1 ) d j 1 d j l 1 1 ) d d j 1 j j j l+1 ) d l+1 [ 1 + d l+1 d l+1 Lemma.. The linear scan phase of the algorithm produces an approximate k-nearest neighbor box B centered at p i with radius at most the side length of B Q. Here B Q is the smallest quadtree box containing B, the k-nearest neighbor box of p i. Proof: Our algorithm scans at least p i k... p i+k, and picks the top k nearest neighbors to p i among these candidates. Let a be the number of points between p i and the largest Morton order point in B. Similarly, let b be the number of points between p i and the smallest Morton order point in B. Clearly, a+b k. Note that B is contained inside B Q, hence µb Q ) k. Now, p i... p i+k must contain a points inside B Q. Similarly, p i k... p i must contain at least b points from B Q. Since we have collected at least k points from B Q, the radius of B is upper bounded by the side length of B Q. Lemma.3. The smallest quadtree box containing B, B Q, ] ]

7 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, SEPTEMBER 9 7 has only a constant number of points more than k in expectation. Proof: Let there be k points in B. Clearly k =Ok) given γ =O1). The expected number of points in B Q is at most E[γ x k ], where x is such that if the side length of B is h then the side length of B Q is h+x. Let the event E x be defined as this occurring for some fixed value of x. Recall from Lemma.1 that j is such that the side length of B Q is h+j. The probability for the event E j is P E j ) = j 1) d j ) d 1 1 j 1 dj 1 )d j j j l ) d l 1) d l ) d From Lemma. B has a side length of at most h+j+1 = h. Let E j be the event that, for some fixed h, B is contained in B Q with side length +j h. Note that E j has the same probability mass function as E j, and that the value of j is independent of j. Given this, P E x) = x 1 P E j)p E j ). From this, E[γx k ] follows E[γ x k] γ x kp E x) = x= x= x= x= x 1 γ x k P E j )P E j ) x 1 γ x k 1 1 dj 1 )d j j j 1 1 d x j 1 x j )d x j) x j) x 1 γ x k 1 1 j )d 1 1 d x x j )d x 1+j)x+j We now use the fact that, j {1,,..., x 1}: 1 j ) d 1 j x ) d 1 x/ ) d which can be proved by showing: 1 j )1 j x ) 1 x/ ) or j + j x x/+1 which is true because a+b ab. Putting this simplified upper bound back in the expectation calculation we get: E[γ x k] = x 1 γ x k x= k k x= x= 1 1 ) d d x x/ γ x 1 1 x/ ) d x 1 d x x 1+j)x+j x 1+j)x+j γ x 1 1 ) d x 1 d x x x )/ x/ jx j) It is easy to show that jx j) x /4 j {1,..., x 1}: Let a = x/ and b = j x/. Then a +b )a b ) = a b where the LHS is jx j) and RHS is x /4 b. Hence jx j) x /4. Using the fact that jx j) x/) in the expectation calculation, we have: k γ x 1 1 ) d d x x x)/ x x /4 x/ x= k xdγ ) x 1 1 ) d x /4 x/ x= Putting y = x/ and c = dγ), we get: k y c y y 1 1 ) d y y=1 Using the Taylor s approximation: 1 1 y ) d 1 d y + y) + d y+1) which when substituted: c ) y 1 k y d y y + y) + d y+1) dy y= Integrating and simplifying using the facts that the error function, erfx) encountered in integrating the normal distribution follows erfx) 1, and c = dγ) =O1), we have: 1 ) ) k ln + π ln e lnc)) 4 ln) = Ok) Now we are ready to prove our main theorem on the running time of our algorithm: Theorem.4. For a given point set P of size n, with a bounded expansion constant and a fixed dimension, in a constrained CREW PRAM model, assuming p threads, the k- nearest neighbor graph can be found in one comparison based sort plus O n p k log k) expected time. Proof: Once B is established in the first part of the algorithm, B Q can be found in Olog k) time by using a binary search outward from p i this corresponds to lines 5 to 14 in Algorithm ). Once B Q is found, it takes at most another Ok) steps to report the solution. There is an additional Olog k) cost for each point update to maintain a priority queue of the k nearest neighbors of p i. Since the results for each point are independent, the neighbors for each point can be computed by an independent thread. Note that our algorithm reduces the problem of computing the k-nng to a sorting problem which can be solved optimally) when k =O1), which is the case for many graphics and visualization applications. Also the expectation in the running time is independent of the input distribution and is valid for arbitrary point clouds.

9 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, SEPTEMBER Size of Input Setmillions) Size of Input Setmillions) a) Graph of 1-NN graph construction time vs. number of data points on the Intel architecture. Each algorithm was run using 8 threads in parallel. a) Graph of 1-NN graph construction time vs. number of data points on AMD architecture. Each algorithm was run using 16 threads in parallel Size of Nearest Neighbor Ball Size of Nearest Neighbor Ball K) b) Graph of k-nn graph construction time for varying k on the Intel architecture. Each algorithm was run using 8 threads in parallel. Data sets contained one million points. b) Graph of k-nn graph construction time for varying k on AMD architecture. Each algorithm was run using 16 threads in parallel. Data sets contained ten million points Number of Threads Number of Threads c) Graph of 1-NN graph construction time for varying number of threads on Intel architecture. Data sets contained ten million points. Fig. 6: Intel Results c) Graph of 1-NN graph construction time for varying number of threads on AMD architecture. Data sets contained ten million points. Fig. 7: AMD Results

10 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, SEPTEMBER Size of Input Setmillions) a) Graph of 1-NN graph construction time vs. number of data points on Sun architecture. Each algorithm was run using 18 threads in parallel. Allocated Memory per Pointbytes) Number of Points millions) Fig. 9: Graph of memory usage per point vs. data size. Memory usage was determined using valgrind Size of Nearest Neighbor Ball b) Graph of k-nn graph construction time for varying k on Sun architecture. Each algorithm was run using 18 threads in parallel. Data sets contained ten million points. Cache Misses/ Size of Point Set k) Fig. 1: Graph of cache misses vs. data set size. All data sets were uniformly random 3-dimensional data sets. Cache misses were determined using valgrind which simulated a MB L1 cache Number of Processors 5 CONCLUSIONS We have presented an efficient k-nearest neighbor construction algorithm which takes advantage of multiple threads. While the algorithm performs best on point sets that use integer coordinates, it is clear from experimentation that the algorithm is still viable using floating point coordinates. Further, the algorithm scales well in practice as k increases, as well as for data sets that are too large to reside in internal memory. Finally, the cache efficiency of the algorithm should allow it to scale well as more processing power becomes available. c) Graph of 1-NN graph construction time for varying number of threads on Sun architecture. Data sets contained ten million points. Fig. 8: Sun Results Acknowledgements We would like to thank Samidh Chatterjee for his comments on the paper and Sariel Har-Peled for useful discussions.

11 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, SEPTEMBER 9 11 REFERENCES [1] M. Alexa, J. Behr, D. Cohen-Or, S. Fleishman, D. Levin, and C. T. Silva. Point set surfaces. In VIS 1: Proceedings of the conference on Visualization 1, pages 1 8, Washington, DC, USA, 1. IEEE Computer Society. [] S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. J. ACM, 45:891 93, [3] M. Bern. Approximate closest-point queries in high dimensions. Inf. Process. Lett., 45):95 99, [4] P. B. Callahan and S. R. Kosaraju. A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields. J. ACM, 41):67 9, [5] T. M. Chan. Approximate nearest neighbor queries revisited. In SCG 97: Proceedings of the thirteenth annual symposium on Computational geometry, pages , New York, NY, USA, ACM. [6] T. M. Chan. Manuscript: A minimalist s implementation of an approximate nearest neighbor algorithm in fixed dimensions, 6. [7] J. Chhugani, B. Purnomo, S. Krishnan, J. Cohen, S. Venkatasubramanian, and D. S. Johnson. vlod: High-fidelity walkthrough of large virtual environments. IEEE Transactions on Visualization and Computer Graphics, 111):35 47, 5. [8] U. Clarenz, M. Rumpf, and A. Telea. Finite elements on point based surfaces. In Proc. EG Symposium of Point Based Graphics SPBG 4), pages 1 11, 4. [9] K. L. Clarkson. Fast algorithms for the all nearest neighbors problem. In FOCS 83: Proceedings of the Twenty-fourth Symposium on Foundations of Computer Science, Tucson, AZ, November Included in PhD Thesis. [1] K. L. Clarkson. Nearest-neighbor searching and metric space dimensions. In G. Shakhnarovich, T. Darrell, and P. Indyk, editors, Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, pages MIT Press, 6. [11] D. Cotting, T. Weyrich, M. Pauly, and M. Gross. Robust watermarking of point-sampled geometry. In SMI 4: Proceedings of the Shape Modeling International 4, pages 33 4, Washington, DC, USA, 4. IEEE Computer Society. [1] M. T. Dickerson and D. Eppstein. Algorithms for proximity problems in higher dimensions. Computational Geometry Theory & Applications, 55):77 91, January [13] S. Fleishman, D. Cohen-Or, and C. T. Silva. Robust moving leastsquares fitting with sharp features. ACM Trans. Graph., 43):544 55, 5. [14] M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS 99: Proceedings of the 4th Annual Symposium on Foundations of Computer Science, pages 85 98, Washington, DC, USA, IEEE Computer Society. [15] D. R. Karger and M. Ruhl. Finding nearest neighbors in growthrestricted metrics. In STOC : Proceedings of the thirty-fourth annual ACM symposium on Theory of computing, pages , New York, NY, USA,. ACM. [16] Swanwa Liao, Mario A. Lopez, and Scott T. Leutenegger. High dimensional similarity search with space filling curves. In Proceedings of the 17th International Conference on Data Engineering, pages 615 6, Washington, DC, USA, 1. IEEE Computer Society. [17] Michael D. Lieberman, Jagan Sankaranarayanan, and Hanan Samet. A fast similarity join algorithm using graphics processing units. In ICDE 8: Proceedings of the 8 IEEE 4th International Conference on Data Engineering, pages , Washington, DC, USA, 8. IEEE Computer Society. [18] N. J. Mitra and A. Nguyen. Estimating surface normals in noisy point cloud data. In SCG 3: Proceedings of the nineteenth annual symposium on Computational geometry, pages 3 38, New York, NY, USA, 3. ACM. [19] G. M. Morton. A computer oriented geodetic data base and a new technique in file sequencing. In Technical Report,IBM Ltd, [] D. Mount. : Library for Approximate Nearest Neighbor Searching, mount//. [1] N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In PLDI 7: Proceedings of the 7 ACM SIGPLAN conference on Programming language design and implementation, pages 89 1, New York, NY, USA, 7. ACM. [] J. A. Orenstein and T. H. Merrett. A class of data structures for associative searching. In PODS 84: Proceedings of the 3rd ACM SIGACT-SIGMOD symposium on Principles of database systems, pages , New York, NY, USA, ACM. [3] R. Pajarola. Stream-processing points. In Proceedings IEEE Visualization, 5, Online., pages Computer Society Press, 5. [4] M. Pauly, M. Gross, and L. P. Kobbelt. Efficient simplification of point-sampled surfaces. In VIS : Proceedings of the conference on Visualization, pages , Washington, DC, USA,. IEEE Computer Society. [5] M. Pauly, R. Keiser, L. P. Kobbelt, and M. Gross. Shape modeling with point-sampled geometry. ACM Trans. Graph., 3):641 65, 3. [6] J. Sankaranarayanan, H. Samet, and A. Varshney. A fast all nearest neighbor algorithm for applications involving large point-clouds. Comput. Graph., 31): , 7. [7] H. Tropf and H. Herzog. Multidimensional range search in dynamically balanced trees. Angewandte Informatik, :71 77, [8] P. Tsigas and Y. Zhang. A simple, fast parallel implementation of quicksort and its performance evaluation on SUN Enterprise 1. Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, :37, 3. [9] P. M. Vaidya. An On log n) algorithm for the all-nearestneighbors problem. Discrete Comput. Geom., 4):11 115, Michael Connor received a MS in Computer Science from Florida State University in 7, where he is currently a PhD student. His primary research interests are Computational Geometry and Parallel Algorithms. Piyush Kumar received his PhD from Stony Brook University in 4. His primary research interests are in Algorithms. He is currently an Assistant Professor at the Department of Computer Science at Florida State University. He was awarded the FSU First Year Assistant Professor Award in 5, the NSF CAREER Award in 7, and the FSU Innovator Award in 8. His research is funded by grants from NSF, FSU, and AMD.

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

The LCA Problem Revisited

The LA Problem Revisited Michael A. Bender Martín Farach-olton SUNY Stony Brook Rutgers University May 16, 2000 Abstract We present a very simple algorithm for the Least ommon Ancestor problem. We thus

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

Computational Geometry. Lecture 1: Introduction and Convex Hulls

Lecture 1: Introduction and convex hulls 1 Geometry: points, lines,... Plane (two-dimensional), R 2 Space (three-dimensional), R 3 Space (higher-dimensional), R d A point in the plane, 3-dimensional space,

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada tmchan@uwaterloo.ca September 12,

Virtuoso and Database Scalability

Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications Jie Gao Department of Computer Science Stanford University Joint work with Li Zhang Systems Research Center Hewlett-Packard

Similarity Search in a Very Large Scale Using Hadoop and HBase

Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France

Using Data-Oblivious Algorithms for Private Cloud Storage Access. Michael T. Goodrich Dept. of Computer Science

Using Data-Oblivious Algorithms for Private Cloud Storage Access Michael T. Goodrich Dept. of Computer Science Privacy in the Cloud Alice owns a large data set, which she outsources to an honest-but-curious

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large

Chapter 4: Non-Parametric Classification

Chapter 4: Non-Parametric Classification Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Gaussian Mixture Model A weighted combination

Divide-and-Conquer. Three main steps : 1. divide; 2. conquer; 3. merge.

Divide-and-Conquer Three main steps : 1. divide; 2. conquer; 3. merge. 1 Let I denote the (sub)problem instance and S be its solution. The divide-and-conquer strategy can be described as follows. Procedure

Lecture 4 Online and streaming algorithms for clustering

CSE 291: Geometric algorithms Spring 2013 Lecture 4 Online and streaming algorithms for clustering 4.1 On-line k-clustering To the extent that clustering takes place in the brain, it happens in an on-line

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE

I/O-Efficient Spatial Data Structures for Range Queries

I/O-Efficient Spatial Data Structures for Range Queries Lars Arge Kasper Green Larsen MADALGO, Department of Computer Science, Aarhus University, Denmark E-mail: large@madalgo.au.dk,larsen@madalgo.au.dk

Fast and Robust Normal Estimation for Point Clouds with Sharp Features

1/37 Fast and Robust Normal Estimation for Point Clouds with Sharp Features Alexandre Boulch & Renaud Marlet University Paris-Est, LIGM (UMR CNRS), Ecole des Ponts ParisTech Symposium on Geometry Processing

EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed

A Comparison of General Approaches to Multiprocessor Scheduling

A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University

CS268: Geometric Algorithms Handout #5 Design and Analysis Original Handout #15 Stanford University Tuesday, 25 February 1992

CS268: Geometric Algorithms Handout #5 Design and Analysis Original Handout #15 Stanford University Tuesday, 25 February 1992 Original Lecture #6: 28 January 1991 Topics: Triangulating Simple Polygons

Oracle8i Spatial: Experiences with Extensible Databases

Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction

Arrangements And Duality

Arrangements And Duality 3.1 Introduction 3 Point configurations are tbe most basic structure we study in computational geometry. But what about configurations of more complicated shapes? For example,

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of

Lecture 6 Online and streaming algorithms for clustering

CSE 291: Unsupervised learning Spring 2008 Lecture 6 Online and streaming algorithms for clustering 6.1 On-line k-clustering To the extent that clustering takes place in the brain, it happens in an on-line

Reliable Systolic Computing through Redundancy

Reliable Systolic Computing through Redundancy Kunio Okuda 1, Siang Wun Song 1, and Marcos Tatsuo Yamamoto 1 Universidade de São Paulo, Brazil, {kunio,song,mty}@ime.usp.br, http://www.ime.usp.br/ song/

siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service

siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service Ahmad Pahlavan Tafti 1, Hamid Hassannia 2, and Zeyun Yu 1 1 Department of Computer Science, University of Wisconsin -Milwaukee,

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis Linda Shapiro Today Registration should be done. Homework 1 due 11:59 pm next Wednesday, January 14 Review math essential

Rackspace Cloud Databases and Container-based Virtualization

Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many

Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li Advised by: Dave Mount May 22, 2014 1 INTRODUCTION In this report we consider the implementation of an efficient

Binary Space Partitions for Axis-Parallel Line Segments: Size-Height Tradeoffs

Binary Space Partitions for Axis-Parallel Line Segments: Size-Height Tradeoffs Sunil Arya Department of Computer Science The Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong

28 Closest-Point Problems -------------------------------------------------------------------

28 Closest-Point Problems ------------------------------------------------------------------- Geometric problems involving points on the plane usually involve implicit or explicit treatment of distances

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.

Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Previous Lecture 13 Indexes for Multimedia Data 13.1

GPUs for Scientific Computing

GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents

arxiv: v1 [math.mg] 6 May 2014

DEFINING RELATIONS FOR REFLECTIONS. I OLEG VIRO arxiv:1405.1460v1 [math.mg] 6 May 2014 Stony Brook University, NY, USA; PDMI, St. Petersburg, Russia Abstract. An idea to present a classical Lie group of

Predicting CPU Availability of a Multi-core Processor Executing Concurrent Java Threads

Predicting CPU Availability of a Multi-core Processor Executing Concurrent Java Threads Khondker Shajadul Hasan 1, Nicolas G. Grounds 2, John K. Antonio 1 1 School of Computer Science, University of Oklahoma,

Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

An Adaptive Index Structure for High-Dimensional Similarity Search

An Adaptive Index Structure for High-Dimensional Similarity Search Abstract A practical method for creating a high dimensional index structure that adapts to the data distribution and scales well with

Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0

Muse Server Sizing 18 June 2012 Document Version 0.0.1.9 Muse 2.7.0.0 Notice No part of this publication may be reproduced stored in a retrieval system, or transmitted, in any form or by any means, without

ABSTRACT. For example, circle orders are the containment orders of circles (actually disks) in the plane (see [8,9]).

Degrees of Freedom Versus Dimension for Containment Orders Noga Alon 1 Department of Mathematics Tel Aviv University Ramat Aviv 69978, Israel Edward R. Scheinerman 2 Department of Mathematical Sciences

Off-line Model Simplification for Interactive Rigid Body Dynamics Simulations Satyandra K. Gupta University of Maryland, College Park

NSF GRANT # 0727380 NSF PROGRAM NAME: Engineering Design Off-line Model Simplification for Interactive Rigid Body Dynamics Simulations Satyandra K. Gupta University of Maryland, College Park Atul Thakur

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

Voronoi Treemaps in D3

Voronoi Treemaps in D3 Peter Henry University of Washington phenry@gmail.com Paul Vines University of Washington paul.l.vines@gmail.com ABSTRACT Voronoi treemaps are an alternative to traditional rectangular

PROVABLY GOOD PARTITIONING AND LOAD BALANCING ALGORITHMS FOR PARALLEL ADAPTIVE N-BODY SIMULATION

SIAM J. SCI. COMPUT. c 1998 Society for Industrial and Applied Mathematics Vol. 19, No. 2, pp. 635 656, March 1998 019 PROVABLY GOOD PARTITIONING AND LOAD BALANCING ALGORITHMS FOR PARALLEL ADAPTIVE N-BODY

Approximation Errors in Computer Arithmetic (Chapters 3 and 4)

Approximation Errors in Computer Arithmetic (Chapters 3 and 4) Outline: Positional notation binary representation of numbers Computer representation of integers Floating point representation IEEE standard

Approximation Algorithms

Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

Multi-dimensional index structures Part I: motivation

Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for

Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

Lecture 7: Approximation via Randomized Rounding

Lecture 7: Approximation via Randomized Rounding Often LPs return a fractional solution where the solution x, which is supposed to be in {0, } n, is in [0, ] n instead. There is a generic way of obtaining

Understanding the Benefits of IBM SPSS Statistics Server

IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

Stationary random graphs on Z with prescribed iid degrees and finite mean connections Maria Deijfen Johan Jonasson February 2006 Abstract Let F be a probability distribution with support on the non-negative

Solution of Linear Systems

Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

Two General Methods to Reduce Delay and Change of Enumeration Algorithms

ISSN 1346-5597 NII Technical Report Two General Methods to Reduce Delay and Change of Enumeration Algorithms Takeaki Uno NII-2003-004E Apr.2003 Two General Methods to Reduce Delay and Change of Enumeration

FCE: A Fast Content Expression for Server-based Computing

FCE: A Fast Content Expression for Server-based Computing Qiao Li Mentor Graphics Corporation 11 Ridder Park Drive San Jose, CA 95131, U.S.A. Email: qiao li@mentor.com Fei Li Department of Computer Science

A Flexible Cluster Infrastructure for Systems Research and Software Development

Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure

Segmentation of building models from dense 3D point-clouds

Segmentation of building models from dense 3D point-clouds Joachim Bauer, Konrad Karner, Konrad Schindler, Andreas Klaus, Christopher Zach VRVis Research Center for Virtual Reality and Visualization, Institute

SQL Server 2005 Features Comparison

Page 1 of 10 Quick Links Home Worldwide Search Microsoft.com for: Go : Home Product Information How to Buy Editions Learning Downloads Support Partners Technologies Solutions Community Previous Versions

Algorithms and Data Structures

Algorithm Analysis Page 1 BFH-TI: Softwareschule Schweiz Algorithm Analysis Dr. CAS SD01 Algorithm Analysis Page 2 Outline Course and Textbook Overview Analysis of Algorithm Pseudo-Code and Primitive Operations

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three

Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Storage Systems Autumn 2009 Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Scaling RAID architectures Using traditional RAID architecture does not scale Adding news disk implies

A Nearest Neighbor Data Structure for Graphics Hardware

A Nearest Neighbor Data Structure for Graphics Hardware Lawrence Cayton Max Planck Institute for Biological Cybernetics lcayton@tuebingen.mpg.de ABSTRACT Nearest neighbor search is a core computational

Parallel Scalable Algorithms- Performance Parameters

www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora Scribe: One of the running themes in this course is the notion of

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.

Lattice Point Geometry: Pick s Theorem and Minkowski s Theorem. Senior Exercise in Mathematics. Jennifer Garbett Kenyon College

Lattice Point Geometry: Pick s Theorem and Minkowski s Theorem Senior Exercise in Mathematics Jennifer Garbett Kenyon College November 18, 010 Contents 1 Introduction 1 Primitive Lattice Triangles 5.1

Resource Allocation Schemes for Gang Scheduling

Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian

2.3 Scheduling jobs on identical parallel machines

2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed

Windows Server Performance Monitoring

Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly

Delaunay Based Shape Reconstruction from Large Data

Delaunay Based Shape Reconstruction from Large Data Tamal K. Dey Joachim Giesen James Hudson Ohio State University, Columbus, OH 4321, USA Abstract Surface reconstruction provides a powerful paradigm for

Cluster Computing at HRI

Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: jasjeet@mri.ernet.in 1 Introduction and some local history High performance computing

Shortest Inspection-Path. Queries in Simple Polygons

Shortest Inspection-Path Queries in Simple Polygons Christian Knauer, Günter Rote B 05-05 April 2005 Shortest Inspection-Path Queries in Simple Polygons Christian Knauer, Günter Rote Institut für Informatik,

Load Balancing Between Heterogenous Computing Clusters

Load Balancing Between Heterogenous Computing Clusters Siu-Cheung Chau Dept. of Physics and Computing, Wilfrid Laurier University, Waterloo, Ontario, Canada, N2L 3C5 e-mail: schau@wlu.ca Ada Wai-Chee Fu

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

Fast Matching of Binary Features

Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been

ON THE COMPLEXITY OF THE GAME OF SET. {kamalika,pbg,dratajcz,hoeteck}@cs.berkeley.edu

ON THE COMPLEXITY OF THE GAME OF SET KAMALIKA CHAUDHURI, BRIGHTEN GODFREY, DAVID RATAJCZAK, AND HOETECK WEE {kamalika,pbg,dratajcz,hoeteck}@cs.berkeley.edu ABSTRACT. Set R is a card game played with a

RevoScaleR Speed and Scalability

EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

Planar Tree Transformation: Results and Counterexample

Planar Tree Transformation: Results and Counterexample Selim G Akl, Kamrul Islam, and Henk Meijer School of Computing, Queen s University Kingston, Ontario, Canada K7L 3N6 Abstract We consider the problem

A Brief Introduction to Property Testing

A Brief Introduction to Property Testing Oded Goldreich Abstract. This short article provides a brief description of the main issues that underly the study of property testing. It is meant to serve as

Distributed Computing over Communication Networks: Maximal Independent Set

Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

About the inverse football pool problem for 9 games 1

Seventh International Workshop on Optimal Codes and Related Topics September 6-1, 013, Albena, Bulgaria pp. 15-133 About the inverse football pool problem for 9 games 1 Emil Kolev Tsonka Baicheva Institute

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

Surface Reconstruction from a Point Cloud with Normals

Surface Reconstruction from a Point Cloud with Normals Landon Boyd and Massih Khorvash Department of Computer Science University of British Columbia,2366 Main Mall Vancouver, BC, V6T1Z4, Canada {blandon,khorvash}@cs.ubc.ca

Report Paper: MatLab/Database Connectivity

Report Paper: MatLab/Database Connectivity Samuel Moyle March 2003 Experiment Introduction This experiment was run following a visit to the University of Queensland, where a simulation engine has been

Solving Geometric Problems with the Rotating Calipers *

Solving Geometric Problems with the Rotating Calipers * Godfried Toussaint School of Computer Science McGill University Montreal, Quebec, Canada ABSTRACT Shamos [1] recently showed that the diameter of

Faster deterministic integer factorisation

David Harvey (joint work with Edgar Costa, NYU) University of New South Wales 25th October 2011 The obvious mathematical breakthrough would be the development of an easy way to factor large prime numbers

Fairness in Routing and Load Balancing

Fairness in Routing and Load Balancing Jon Kleinberg Yuval Rabani Éva Tardos Abstract We consider the issue of network routing subject to explicit fairness conditions. The optimization of fairness criteria

THE concept of Big Data refers to systems conveying

EDIC RESEARCH PROPOSAL 1 High Dimensional Nearest Neighbors Techniques for Data Cleaning Anca-Elena Alexandrescu I&C, EPFL Abstract Organisations from all domains have been searching for increasingly more

Network File Storage with Graceful Performance Degradation

Network File Storage with Graceful Performance Degradation ANXIAO (ANDREW) JIANG California Institute of Technology and JEHOSHUA BRUCK California Institute of Technology A file storage scheme is proposed

Load Balancing between Computing Clusters Siu-Cheung Chau Dept. of Physics and Computing, Wilfrid Laurier University, Waterloo, Ontario, Canada, NL 3C5 e-mail: schau@wlu.ca Ada Wai-Chee Fu Dept. of Computer