GPU-based Cloud Computing for Comparing the Structure of Protein Binding Sites

Size: px
Start display at page:

Download "GPU-based Cloud Computing for Comparing the Structure of Protein Binding Sites"

Transcription

1 GPU-based Cloud Computing for Comparing the Structure of Protein Binding Sites Matthias Leinweber 1, Lars Baumgärtner 1, Marco Mernberger 1, Thomas Fober 1, Eyke Hüllermeier 1, Gerhard Klebe 2, Bernd Freisleben 1 1 Department of Mathematics & Computer Science and Center for Synthetic Microbiology, University of Marburg Hans-Meerwein-Str. 3, D-3532 Marburg, Germany 2 Department of Pharmacy and Center for Synthetic Microbiology, University of Marburg Marbacher Weg 6, D-3537 Marburg, Germany 1 {leinweberm, lbaumgaertner, mernberger, thomas, eyke, freisleb}@informatik.uni-marburg.de 2 klebe@staff.uni-marburg.de Abstract In this paper, we present a novel approach for using a GPU-based Cloud computing infrastructure to efficiently perform a structural comparison of protein binding sites. The original CPU-based Java version of a recent graph-based algorithm called SEGA has been rewritten in OpenCL to run on NVIDIA GPUs in parallel on a set of Amazon EC2 Cluster GPU Instances. This new implementation of SEGA has been tested on a subset of protein structure data contained in the CavBase, providing a structural comparison of protein binding sites on a much larger scale than in previous research efforts reported in the literature. Index Terms GPU, Cloud computing, protein binding sites, structure comparison, graph alignment, OpenCL. I. INTRODUCTION A major goal in synthetic biology is the manipulation of the genetic setup of living cells to introduce novel biochemical pathways and alter existing ones. A prerequisite for the constitution of new biochemical pathways in microorganisms is a working knowledge of the biochemical function of the proteins of interest. Since assessing protein function experimentally is time-consuming and in some cases even infeasible, the prediction of protein function is a central task in bioinformatics. Typically, the function of a protein is inferred from similar proteins with known functions, most prominently by a sequence comparison, owing to the observation that proteins with an amino acid sequence similarity larger than 4% tend to have similar functions [19]. Accordingly, a plethora of algorithms exists for comparing protein sequences, including the well-known NCBI BLAST algorithm [1]. Yet, below this threshold of 4%, results of sequence comparisons become more and more uncertain [11]. In cases where a sequence-based inference of protein function remains inconclusive, a structural comparison can provide further insights and uncover more remote similarities [18], especially when focusing on functionally important regions of proteins, such as protein binding sites. Several algorithms are known to compare possible protein binding sites based on structural data [9], [17], [3]. However, such algorithms have much longer runtimes than their sequence-based counterparts, severely limiting their use for large scale comparisons. In this paper, we present a novel approach to significantly speed up the computation times of a recent graphbased algorithm for performing a structural comparison of protein binding sites, called SEGA [15], by using the digital ecosystem of a GPU-based Cloud computing infrastructure. The original CPU-based Java version of SEGA has been rewritten in OpenCL to run on NVIDIA GPUs in parallel on a set of Amazon EC2 Cluster GPU Instances. This new implementation of SEGA has been tested on protein structure data of the CavBase [16], providing a structural comparison of protein binding sites on a much larger scale than in previous research efforts reported in the literature. This paper is organized as follows. Section II discusses related work. The SEGA algorithm is described in Section III, and its GPU implementation is presented in Section IV. Experimental results are discussed in Section V. Section VI concludes the paper and outlines areas for future work. II. RELATED WORK Several graph-based algorithms for protein structure analysis have been proposed in the literature. For example, a subgraph isomorphism algorithm [2] has been used by Artymiuk et al. [2] to identify amino acid side chain patterns. Furthermore, Xie and Bourne [22] have proposed an approach utilizing weighted subgraph isomorphism, while Jambon et al. [6] employ heuristics to find correspondences. A more recent approach based on fuzzy histograms to find similarities in structural protein data has been presented by Fober and Hüllermeier [5]. Fober et al. [4] have shown that pair-wise or multiple alignments on structural protein information can be achieved using labeled point clouds, i.e., sets of vertices in a three-dimensional coordinate system. Apparently, algorithms for performing a structural comparison of protein binding sites have not been designed to run on modern GPUs. However, there are several sequence-based protein analysis approaches that were ported to GPUs. For example, NCBI BLAST runs on GPUs to achieve significant speedups [21]. Other projects, such as CUDASW++ and CUDA-BLASTP [13],[14],[12],[8], have shown that GPUs can

2 be used as cheap and powerful accelerators for well-known algorithms for performing local sequence alignment, such as the Smith-Waterman algorithm. III. THE SEGA ALGORITHM The SEGA algorithm constructs a global graph alignment of complete node-labeled and edge-weighted graphs, i.e., a 1- to-1 correspondence of nodes. In principle, SEGA realizes a divide and conquer strategy by first solving a correspondence problem on a local scale to derive a distance measure on nodes. This local distance measure is used in a second step to solve another correspondence problem on a global scale, by deriving a mutual assignment of nodes to construct a global graph alignment. To derive a local distance measure, nodes are compared in terms of their immediate surroundings, i.e., the node neighborhood. This node neighborhood is defined by the subgraph formed by the n nearest neighbor nodes. Since SEGA has been developed for graphs representing protein binding sites based on CavBase data [16], nodes represent pseudocenters, i.e., spatial descriptors of physicochemical properties present within a binding site. Edges are weighted with the Euclidian distance between pseudocenters. The basic assumption is that the more similar the immediate surroundings of two pseudocenters are, the higher the likelihood that they belong to corresponding protein regions. Comparing the node neighborhood thus corresponds to comparing the spatial constellation of physicochemical properties in close proximity of these pseudocenters. If these are highly similar, a mutual assignment of these nodes should be favored. Given two input graphs G 1 = (V 1, E 1 ) and G 2 = (V 2, E 2 ) with V 1 = m 1 and V 2 = m 2, a local m 1 m 2 distance matrix D = (d ij ) 1 i m1,1 j m 2 is obtained by extracting the induced neighborhood subgraph for each center node v i V 1 and v j V 2 as given by the set of nodes including the center nodes themselves and the closest n neighbor nodes. To obtain a distance measure between two nodes v i and v j, the corresponding subgraphs are decomposed into the set of all triangles containing the center node (see Figure 1). Then, an assignment problem is solved to obtain the number of matching triangles. Triangles are considered to match, if a mutual assignment of nodes exists for which node labels of corresponding neighbor nodes are identical and all corresponding edge weights are within an ɛ -range of each other. In other words, a superposition preserving node labels (exempting the center node) and edge lengths is obtained. The node labels of the center nodes are not required to match, to introduce a certain level of tolerance, which is necessary when dealing with molecular structure data. Likewise, the parameter ɛ is a tolerance threshold determining the allowed deviation of edge lengths. The obtained distance matrix D can be considered as a cost matrix, indicating the cost for each potential assignment of nodes v i V 1 and v j V 2. In the second step of the algorithm, an optimal assignment of nodes from V 1 and V 2 is derived incrementally, by first realizing the assignment of nodes that have the smallest distance to each other before assigning the next pair of nodes. Fig. 1. Decomposition of the neighborhood of node with n neigh = 4. The subgraph defined by the n neigh nearest nodes is decomposed into triangles containing the center node. If ambiguities arise, SEGA resorts to global information by selecting assignments for which both nodes preferably show a small deviation with respect to an already obtained partial solution. More precisely, the relative position of candidate nodes to each node in the partial solution is determined and used to calculate another cost matrix, containing a measure of the geometric deviation for each candidate pair. The actual assignments are then obtained by solving another optimal assignment problem, using the Hungarian algorithm [1]. A more detailed description of the approach can be found in Mernberger et al. [15]. IV. SEGA IN A GPU CLOUD In this section, a version of the SEGA algorithm running on GPU hardware and a pipelined computation framework for performing large scale GPU-based structural comparisons of protein binding sites in a Cloud environment are presented. A. GPU Implementation of SEGA A common problem when developing applications to run on GPU hardware is that it is not easy to utilize all resources of a computational node efficiently. If the complete algorithm is implemented to run on a GPU, the host CPU s work only consists of controlling the device, which usually is not sufficient to operate the processor at full load. The SEGA algorithm is well suited for a division into a GPU and a CPU part. The part of the algorithm that solves the correspondence problem has been rewritten to run on GPU hardware using OpenCL. The iterative part constructing a global alignment is computed on the host CPU, supported by intermediate results generated by the GPU part of the implementation. The creation of the cost matrix D (see Section III) is divided into four OpenCL kernels. The first OpenCL kernel builds input graphs G = (V, E) from the point cloud information provided by the protein cavity database. The data is stored in a m m matrix where m = V is the number of points describing this cavity. Based on the data parallelism in this

3 Fig. 2. SEGA GPU architecture overview. task, this kernel can run with m 2 threads at once, where each thread computes a pair-wise distance. The second OpenCL kernel constructs an intermediate matrix for a protein cavity. This matrix contains for each V G the indices for the n nearest neighbors. Each line in this matrix is data-independent and contains the indices for the n smallest values from the corresponding matrix line in D. This is calculated by m (m/2) threads, where m/2 threads calculate the n smallest values with parallel reduction and the use of block-shared memory. A neighborhood size of n results in l = n (n 1)/2 triangles for each V G. These triangles are stored in a m l matrix Z that is created by the third OpenCL kernel. This kernel is executed with m l threads in parallel using a vector containing the indices indicating which of the n nearest neighbors is combined with which other neighbor. The last OpenCL kernel combines two triangle matrices Z 1, Z 2 into a distance matrix D with m 1 m 2 elements. It is executed with m 1 m 2 threads, where each thread loops over l l triangles computing the cost for a match. The final alignment of the distance matrix D is computed as described in Section III, supported by intermediate results generated by the OpenCL part. B. Management Framework We have developed a software framework for managing the GPU and CPU computations involved in our implementation. The framework consists of six major components. Three components control the GPU hardware, the fourth component is responsible for selecting objects for comparison, the fifth component offers a service to manage thread pools for workloads on CPUs, and the sixth component provides progress monitoring functionality. The six components communicate via queues that offer multithreading inside each component and additionally a viable way for utilizing multiple GPU devices on a single compute node. Furthermore, this design offers the possibility of repeated execution of a computation on GPU and CPU hardware. This can easily be realized by states inside a calculation object that contains a set of tasks to handle a group of comparisons. A calculation object contains two important pieces of information: (a) a description of the entities to be compared, and (b) a set of instructions that are to be issued when a comparison is performed. Figure 2 shows the orchestration of the six components involved in the comparison of the protein binding sites using our GPU enhanced implementation of the SEGA algorithm. Furthermore, it also illustrates the data flow through the framework. The Selector component is the entrance point of the framework. It provides both an interconnection to a data store with caching capabilities and the program logic that controls which entities should be compared next. To perform the SEGA comparisons, the Selector combines a set of protein cavity identifiers and loads the point cloud data. This information is passed via a queue to the DataProcessor. Additionally, the Selector stores meta-information in the Monitor component, such as the tasks in progress. In our case, no further work of the algorithm depends on the CPU at this point, so the next component belongs to the GPU. The decision to split the GPU part into three components is mainly due to the design of modern GPU hardware. The latest generation of GPU hardware offers independent control flows of memory reads, memory writes and kernel execution induced by the host system. Therefore, the DataProcessor component containing an arbitrary number of threads is responsible for converting (if needed) and transferring data from the host system to the GPU device memory. Moreover, each GPU device is controlled by its own set of GPU components to ensure maximum utilization of the given resources. For SEGA, the point cloud data is copied into OpenCL buffers and transferred to the GPU. At this point, we encountered a possible bottleneck in the management of OpenCL memory objects: handling several thousands of objects dramatically reduced the allocation performance. Thus, we had to introduce an additional component responsible for ensuring a simple and efficient reuse of memory objects. Additionally, this allows a safer use of GPU device memory because such a pre-allocation

4 guarantees that the GPU device memory is not exceeded during execution, and also limits the number of computations currently in progress. After a successful write operation to the GPU, the calculation object containing the meta-information is passed via an additional queue to the Launcher component. The Launcher executes the corresponding GPU kernels, which in the case of SEGA are responsible for creating the polygon data and combining two distance matrices. After completion, the calculation object is pushed into the next queue. The last GPU related component is the Dispatcher. It is responsible for reading back the results of the kernel execution to the host memory and if necessary process the data further. Afterwards, the results are pushed to the ThreadService. Here, the alignments of the polygons are calculated, and the results are stored. After successfully finishing a computation, the Monitor component is informed. The Monitor fulfills two major tasks. First, it creates an interconnection between the Selector and the ThreadService for storing the results. This is necessary to know whether all combinations have been successfully calculated. Additionally, it records the progress of the computation on a persistent storage. If a computation becomes interrupted due to unpredictable reasons, such as system failures or disk I/O errors, the computation can be resumed at the correct position. The described framework has been implemented in Java using the JogAmp JOCL library [7] for controlling the OpenCL platform. C. Cloud Deployment A common approach for parallelizing a computational problem is its division into three steps: work partitioning and distribution, task computation and result collection. In case of a commutative comparison where a self-comparison is not necessary, an input set of n elements results in a total number n (n 1)/2 computations. A straight-forward approach is to divide the total number of computations by the available number of Cloud nodes. If every comparison is indexed by a single unique identifier, a single node simply needs the identifier to perform a comparison. However, a better approach is to divide the total number of comparisons by an arbitrary number that is larger than the available number of nodes. This allows one to start the result collection phase before the end of the task computation phase and, moreover, enables an on-demand scheduling of tasks to other nodes in case a node fails. The work partitioning and distribution phase also includes the distribution of the input data. For this purpose, several approaches are possible, such as data replication, network exports, and cluster file systems. Fortunately, in our case the required data of the cavity database could be reduced to about 14 MB. Consequently, the data has been transferred and loaded into the main memory of each node. Due to the overall runtimes, this has a negligible impact on the total computation time. After data and task distribution, the nodes can calculate their part(s). When a task has finished, its results can be collected from the Cloud and stored locally. V. EVALUATION To assess the performance of our approach, several experiments have been conducted. The evaluation is split into two parts. First, the performance gains of the SEGA GPU compared to the original SEGA algorithm are investigated. Second, the results of a large scale comparison of protein binding sites on Amazon s EC2 Cluster GPU Instances are presented. The structural data has been taken from the CavBase [16] maintained by the Cambridge Crystallographic Data Centre. Fig (a) GPU part of SEGA GPU (b) CPU part of SEGA GPU (c) Maximum of GPU and CPU parts of SEGA GPU SEGA benchmarks A. SEGA vs. SEGA GPU 5 (d) Original SEGA The performance of the original SEGA implementation has been measured on a single core of an Intel Core 3.4 GHz with 8 GB RAM, whereas the performance of SEGA GPU has been measured on a single NVIDIA GeForce GTX 58 with 3 GB RAM. The runtimes depend on the present in the protein cavities, and thus both SEGA versions

5 Fig. 5. Pseudocenter distribution among the selected subset of the CavBase Fig. 4. Comparison of original SEGA and SEGA GPU benchmarks have been benchmarked using a subset of the CavBase with a large spectrum of numbers of pseudocenters. In particular, the subset consists of cavities where the numbers of pseudocenters range from 15 to 25. For each comparison, some cavities matching certain size requirements were selected and compared several times (1 times for SEGA GPU; 1 times for original SEGA) to calculate the average runtimes for a particular size combination. The plots in Figure 3 show the runtimes depending on the of each cavity. Figure 3(a) shows the average runtime of the GPU part of a SEGA GPU run, and Figure 3(b) shows the runtime of the CPU part of a SEGA GPU run. It is evident that the needed CPU runtime is often higher but is never twice as much as the GPU runtime. One could argue that in typical cluster nodes that offer GPU hardware, for each GPU at least two physical CPU cores are available. Instead, we decided to look at the worst case and compared the results with Figure 3(c). This plot shows the maximum of the two preceding graphs. Finally, Figure 3(d) shows the runtimes of the original SEGA implementation. Figure 4 shows the SEGA GPU and original SEGA runtimes in a single plot. It is important to note that the z-axis has a logarithmic scale. It is evident that the SEGA GPU implementation is 1 to (with an average of 11) times faster than the original SEGA implementation, depending on the in each cavity. B. SEGA Amazon EC2 The main target platform for SEGA GPU is Amazon s EC2 Cluster GPU Instances. Each node (instance type: cg1.4xlarge) has two Intel Xeon X557 CPUs, 22 GB RAM and two NVIDIA Tesla M25 with 2 GB RAM. Benchmarks between the Tesla M25 and GeForce GTX 58 have shown that the GTX 58 is about two times faster than the TESLA. This matches the theoretical GFLOPS specifications from NVIDIA (single precision floating point). Thus, the GPU runtime measured in the previous section corresponds to a single EC2 node. The subset of the CavBase used in our experiments has been selected based on the following (pharmaceutically meaningful) criteria: The resolution of a cavity must be larger than 2.5Å; the volume must be between 35Å3 and 35Å3 ; a protein must have at least 11 pseudo centers. This resulted in n = protein binding sites, leading to n (n 1)/2 = comparisons in total. Fig (a) Boxplot showing the runtime distribution for SEGA GPU OpenCL pure Java (b) Boxplot showing the runtime distribution for SEGA GPU compared to the original SEGA implementation Boxplots showing the randomly sampled runtime distributions Using Amazon s EC2 resources with associated costs makes it important to predict the expected total runtime of a computation especially if a hard limit for the financial budget must be respected. According to Figure 5, the number of pseudo centers of the proteins in the selected subset of the CavBase is not uniformly distributed. Thus, to predict the total runtime, randomly sampled pairs from the CavBase were visualized with boxplots. The blue box enclosed by the lower and upper

6 quartile contains the medial 5% of the data. The distance between the upper and lower quartile defines the interquartile range (IQR), a measure for the variance of the data. The (lower and upper) whisker visualizes the remaining data that is not contained in the box defined by the lower and upper quartile. Its length is bounded by 1.5 IQR. Data outside the whisker are outliers and marked by a cross. The 5th percentile (median) is visualized by a red line, the confidence interval (α =.5) for the mean by a triangle. Figure 6 (a) shows the boxplot for the SEGA GPU implementation. Figure 6 (b) shows a comparison between the original SEGA implementation and SEGA GPU to exemplify the performance gain. A runtime per comparison of 1.7 ms was expected due to the boxplot. To efficiently use of the infrastructure provided by Amazon EC2, the entire computation was divided to run on 8 Amazon EC2 Cluster GPU Instances in parallel. The comparisons were grouped into 496 packages and distributed by assigning 512 packages to each node. Due to the runtime of a single comparison and a total number of about 1.5 billion comparisons, a runtime of about 24 days on eight EC2 nodes was expected. In reality, the computation took about 22 days to complete. The cost was about 6.7 US-$ ( comparisons 1,7 ms / 3.6. ms/h 1,234 US-$/h = US-$ for computations, the rest for storage and network traffic). In contrast, performing the 1.5 billion comparisons on a single core of an Intel Core 3.4 GHz with about 3 ms runtime per comparison (see Figure 6 (b)) would require about days ( 1 years); on a quadcore node with the same specifications, about 9.16 days ( 25 years) are required. If an Amazon High Quad CPU Instance with a cost of,4 US-$ per hour were used, the total cost would amount to about US-$. VI. CONCLUSIONS In this paper, we have presented a novel approach to significantly speed up the computation times of the SEGA algorithm for a structural comparison of protein binding sites by using the digital ecosystem of a GPU-based Cloud computing infrastructure. The original CPU-based Java version of SEGA has been rewritten in OpenCL to run on NVIDIA GPUs in parallel on a set of Amazon EC2 Cluster GPU Instances. This new implementation of SEGA has been tested on a subset of protein structure data of the CavBase, requiring an acceptable computation time of about three weeks. Thus, a structural approach to compare protein binding sites becomes a viable alternative to sequence-based alignment algorithms. There are several directions for future work. For example, a comparative analysis could be done for the entire protein space in the CavBase, which not only allows a classification of the protein space into structurally and functionally similar, homologous and non-homologous protein groups, but also supports the systematic search for unexpected similarities and functional relationships. Furthermore, other algorithms for a structural comparison of protein binding sites could be rewritten to run on GPU hardware to provide further insights. ACKNOWLEDGEMENTS This work is partially supported within the LOEWE program of the State of Hesse, Germany, the German Research Foundation (DFG), and a research grant provided by Amazon Web Services (AWS) in Education. REFERENCES [1] S. F. Altschul. BLAST Algorithm. John Wiley & Sons, Ltd, 1. [2] P. J. Artymiuk, A. R. Poirrette, H. M. Grindley, D. W. Rice, and P. Willett. A Graph-theoretic Approach to the Identification of Threedimensional Patterns of Amino Acid Side-chains in Protein Structures. Journal of Molecular Biology, 243(2): , [3] T. Binkowski and A. Joachimiak. Protein functional surfaces: global shape matching and local spatial alignments of ligand binding sites. BMC structural biology, 8(1):45 68, 8. [4] T. Fober, G. Glinca, G. Klebe, and E. Hüllermeier. Superposition and Alignment of Labeled Point Clouds. IEEEACM Transactions on Computational Biology and Bioinformatics, 8(6): , 211. [5] T. Fober and E. Hullermeier. Similarity Measures for Protein Structures Based on Fuzzy Histogram Comparison. Computational Intelligence, pages 18 23, 21. [6] M. Jambon, A. Imberty, G. Delage, and C. Geourjon. A new bioinformatic approach to detect common 3D sites in protein structures. Proteins, 52(2): , 3. [7] JogAmp Community. JogAmp JOCL [8] M. A. Kentie. Biological Sequence Alignment on Graphics Processing Units. Master s thesis, Delft University of Technology, 21. [9] K. Kinoshita and H. Nakamura. Identification of protein biochemical functions by similarity search using the molecular surface database efsite. Protein Science, 12(8): , 3. [1] H. Kuhn. The hungarian method for the assignment problem. Naval Research Logistics, 52(1):7 21, 5. [11] D. Lee, O. Redfern, and C. Orengo. Predicting protein function from sequence and structure. Nature Reviews Molecular Cell Biology, 8(12):995 15, 7. [12] W. Liu, B. Schmidt, and W. Muller-Wittig. Cuda-blastp: Accelerating blastp on cuda-enabled graphics hardware. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 8(6): , Nov [13] Y. Liu, W. Huang, J. Johnson, and S. Vaidya. GPU accelerated smithwaterman. In Proceedings of the 6th international conference on Computational Science - Volume Part IV, ICCS 6, pages , Berlin, Heidelberg, 6. Springer-Verlag. [14] Y. Liu, D. Maskell, and B. Schmidt. CUDASW++: optimizing Smith- Waterman sequence database searches for CUDA-enable graphics processing units. BMC Research Notes, 2(1):73, 9. [15] M. Mernberger, G. Klebe, and E. Hüllermeier. SEGA: Semi-global graph alignment for structure-based protein comparison. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(5): , 211. [16] S. Schmitt, D. Kuhn, and G. Klebe. A New Method to Detect Related Function Among Proteins Independent of Sequence and Fold Homology. Journal of Molecular Biology, 323(2):387 46, 2. [17] A. Stark and R. Russell. Annotation in three dimensions. PINTS: patterns in non-homologous tertiary structures. Nucleic Acids Research, 31(13): , 3. [18] J. M. Thornton. From genome to function. Science, 292(5524): , 1. [19] A. Todd, C. Orengo, and J. Thornton. Evolution of function in protein superfamilies, from a structural perspective. Journal of Molecular Biology, 37(4): , 1. [2] J. R. Ullmann. An Algorithm for Subgraph Isomorphism. Journal of the ACM, 23(1):31 42, [21] P. D. Vouzis and N. V. Sahinidis. GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics, 27(2): , 211. [22] L. Xie and P. E. Bourne. Detecting evolutionary relationships across existing fold space, using sequence order-independent profileprofile alignments. Proceedings of the National Academy of Sciences of the United States of America, 15(14): , 8.

Local Alignment Tool Based on Hadoop Framework and GPU Architecture

Local Alignment Tool Based on Hadoop Framework and GPU Architecture Local Alignment Tool Based on Hadoop Framework and GPU Architecture Che-Lun Hung * Department of Computer Science and Communication Engineering Providence University Taichung, Taiwan clhung@pu.edu.tw *

More information

ultra fast SOM using CUDA

ultra fast SOM using CUDA ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A

More information

Fuzzy Modeling of Labeled Point Cloud Superposition for the Comparison of Protein Binding Sites

Fuzzy Modeling of Labeled Point Cloud Superposition for the Comparison of Protein Binding Sites Fuzzy Modeling of Labeled Point Cloud Superposition for the Comparison of Protein Binding Sites Thomas Fober Eyke Hüllermeier Knowledge Engineering & Bioinformatics Group Mathematics and Computer Science

More information

Acceleration of Video Conversion on the GPU based Cloud

Acceleration of Video Conversion on the GPU based Cloud Acceleration of Video Conversion on the GPU based Cloud Prof. Sandip M. Walunj 1, Akash Talole 2, Gaurav Taori 2, Sachin Kothawade 2 1 Professor, Department of Computer Engineering, Sandip Research Centre,

More information

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

More information

Clustering Billions of Data Points Using GPUs

Clustering Billions of Data Points Using GPUs Clustering Billions of Data Points Using GPUs Ren Wu ren.wu@hp.com Bin Zhang bin.zhang2@hp.com Meichun Hsu meichun.hsu@hp.com ABSTRACT In this paper, we report our research on using GPUs to accelerate

More information

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper

More information

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing

More information

Evaluation of CUDA Fortran for the CFD code Strukti

Evaluation of CUDA Fortran for the CFD code Strukti Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center

More information

Research Article Cloud Computing for Protein-Ligand Binding Site Comparison

Research Article Cloud Computing for Protein-Ligand Binding Site Comparison BioMed Research International Volume 213, Article ID 17356, 7 pages http://dx.doi.org/1.1155/213/17356 Research Article Cloud Computing for Protein-Ligand Binding Site Comparison Che-Lun Hung 1 and Guan-Jie

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

Intelligent Heuristic Construction with Active Learning

Intelligent Heuristic Construction with Active Learning Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U Space is BIG! Hubble Ultra-Deep Field

More information

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Jianqiang Dong, Fei Wang and Bo Yuan Intelligent Computing Lab, Division of Informatics Graduate School at Shenzhen,

More information

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large

More information

GeoImaging Accelerator Pansharp Test Results

GeoImaging Accelerator Pansharp Test Results GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

CiteSeer x in the Cloud

CiteSeer x in the Cloud Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Accelerating variant calling

Accelerating variant calling Accelerating variant calling Mauricio Carneiro GSA Broad Institute Intel Genomic Sequencing Pipeline Workshop Mount Sinai 12/10/2013 This is the work of many Genome sequencing and analysis team Mark DePristo

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

EXPLORING SPATIAL PATTERNS IN YOUR DATA

EXPLORING SPATIAL PATTERNS IN YOUR DATA EXPLORING SPATIAL PATTERNS IN YOUR DATA OBJECTIVES Learn how to examine your data using the Geostatistical Analysis tools in ArcMap. Learn how to use descriptive statistics in ArcMap and Geoda to analyze

More information

11.1 inspectit. 11.1. inspectit

11.1 inspectit. 11.1. inspectit 11.1. inspectit Figure 11.1. Overview on the inspectit components [Siegl and Bouillet 2011] 11.1 inspectit The inspectit monitoring tool (website: http://www.inspectit.eu/) has been developed by NovaTec.

More information

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Table of Contents. June 2010

Table of Contents. June 2010 June 2010 From: StatSoft Analytics White Papers To: Internal release Re: Performance comparison of STATISTICA Version 9 on multi-core 64-bit machines with current 64-bit releases of SAS (Version 9.2) and

More information

Delivering Quality in Software Performance and Scalability Testing

Delivering Quality in Software Performance and Scalability Testing Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU Efficient Parallel Graph Exploration on Multi-Core CPU and GPU Pervasive Parallelism Laboratory Stanford University Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun Graph and its Applications Graph Fundamental

More information

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

More information

A Theory of the Spatial Computational Domain

A Theory of the Spatial Computational Domain A Theory of the Spatial Computational Domain Shaowen Wang 1 and Marc P. Armstrong 2 1 Academic Technologies Research Services and Department of Geography, The University of Iowa Iowa City, IA 52242 Tel:

More information

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

Optimal Service Pricing for a Cloud Cache

Optimal Service Pricing for a Cloud Cache Optimal Service Pricing for a Cloud Cache K.SRAVANTHI Department of Computer Science & Engineering (M.Tech.) Sindura College of Engineering and Technology Ramagundam,Telangana G.LAKSHMI Asst. Professor,

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Similarity Search in a Very Large Scale Using Hadoop and HBase

Similarity Search in a Very Large Scale Using Hadoop and HBase Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France

More information

Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format

Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format , pp.91-100 http://dx.doi.org/10.14257/ijhit.2014.7.4.09 Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format Jingjing Zheng 1,* and Ting Wang 1, 2 1,* Parallel Software and Computational

More information

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,

More information

Accelerating CFD using OpenFOAM with GPUs

Accelerating CFD using OpenFOAM with GPUs Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0

Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0 Muse Server Sizing 18 June 2012 Document Version 0.0.1.9 Muse 2.7.0.0 Notice No part of this publication may be reproduced stored in a retrieval system, or transmitted, in any form or by any means, without

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

Comparison of Windows IaaS Environments

Comparison of Windows IaaS Environments Comparison of Windows IaaS Environments Comparison of Amazon Web Services, Expedient, Microsoft, and Rackspace Public Clouds January 5, 215 TABLE OF CONTENTS Executive Summary 2 vcpu Performance Summary

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE Sudha M 1, Harish G M 2, Nandan A 3, Usha J 4 1 Department of MCA, R V College of Engineering, Bangalore : 560059, India sudha.mooki@gmail.com 2 Department

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Consensus Scoring to Improve the Predictive Power of in-silico Screening for Drug Design

Consensus Scoring to Improve the Predictive Power of in-silico Screening for Drug Design Consensus Scoring to Improve the Predictive Power of in-silico Screening for Drug Design Masato Okada Faculty of Science and Technology, Masato Tsukamoto Faculty of Pharmaceutical Sciences, Hayato Ohwada

More information

Parallel Simplification of Large Meshes on PC Clusters

Parallel Simplification of Large Meshes on PC Clusters Parallel Simplification of Large Meshes on PC Clusters Hua Xiong, Xiaohong Jiang, Yaping Zhang, Jiaoying Shi State Key Lab of CAD&CG, College of Computer Science Zhejiang University Hangzhou, China April

More information

Platforms and Algorithms for Big Data Analytics Chandan K. Reddy Department of Computer Science Wayne State University

Platforms and Algorithms for Big Data Analytics Chandan K. Reddy Department of Computer Science Wayne State University Platforms and Algorithms for Big Data Analytics Chandan K. Reddy Department of Computer Science Wayne State University http://www.cs.wayne.edu/~reddy/ http://dmkd.cs.wayne.edu/tutorial/bigdata/ What is

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Load Balancing Mechanisms in Data Center Networks

Load Balancing Mechanisms in Data Center Networks Load Balancing Mechanisms in Data Center Networks Santosh Mahapatra Xin Yuan Department of Computer Science, Florida State University, Tallahassee, FL 33 {mahapatr,xyuan}@cs.fsu.edu Abstract We consider

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation

Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation Ying Peng, Bin Gong, Hui Liu, and Yanxin Zhang School of Computer Science and Technology, Shandong University,

More information

Speed Performance Improvement of Vehicle Blob Tracking System

Speed Performance Improvement of Vehicle Blob Tracking System Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu, nevatia@usc.edu Abstract. A speed

More information

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) 820 REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) (See also General Regulations) BMS1 Admission to the Degree To be eligible for admission to the degree of Bachelor

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Performance Analysis of Web based Applications on Single and Multi Core Servers

Performance Analysis of Web based Applications on Single and Multi Core Servers Performance Analysis of Web based Applications on Single and Multi Core Servers Gitika Khare, Diptikant Pathy, Alpana Rajan, Alok Jain, Anil Rawat Raja Ramanna Centre for Advanced Technology Department

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

More information

GPU for Scientific Computing. -Ali Saleh

GPU for Scientific Computing. -Ali Saleh 1 GPU for Scientific Computing -Ali Saleh Contents Introduction What is GPU GPU for Scientific Computing K-Means Clustering K-nearest Neighbours When to use GPU and when not Commercial Programming GPU

More information

Graph theoretic approach to analyze amino acid network

Graph theoretic approach to analyze amino acid network Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Mining Large Datasets: Case of Mining Graph Data in the Cloud

Mining Large Datasets: Case of Mining Graph Data in the Cloud Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden smakadir@csc.kth.se,

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.) Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

More information

Benchmarking Large Scale Cloud Computing in Asia Pacific

Benchmarking Large Scale Cloud Computing in Asia Pacific 2013 19th IEEE International Conference on Parallel and Distributed Systems ing Large Scale Cloud Computing in Asia Pacific Amalina Mohamad Sabri 1, Suresh Reuben Balakrishnan 1, Sun Veer Moolye 1, Chung

More information

SYSTAP / bigdata. Open Source High Performance Highly Available. 1 http://www.bigdata.com/blog. bigdata Presented to CSHALS 2/27/2014

SYSTAP / bigdata. Open Source High Performance Highly Available. 1 http://www.bigdata.com/blog. bigdata Presented to CSHALS 2/27/2014 SYSTAP / Open Source High Performance Highly Available 1 SYSTAP, LLC Small Business, Founded 2006 100% Employee Owned Customers OEMs and VARs Government TelecommunicaHons Health Care Network Storage Finance

More information

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

Performance test report

Performance test report Disclaimer This report was proceeded by Netventic Technologies staff with intention to provide customers with information on what performance they can expect from Netventic Learnis LMS. We put maximum

More information

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University HiCOMB 2014 May 19 th, Phoenix, Arizona 1 Outline

More information