Accelerating life sciences research

Size: px
Start display at page:

Download "Accelerating life sciences research"

Transcription

1 IBM Systems and Technology Thought Leadership White Paper June 2013 Accelerating life sciences research IBM Platform Symphony helps deliver improved performance for life sciences workloads using Contrail software

2 2 Accelerating life sciences research Contents 2 Addressing the challenges of genome assembly with Contrail 3 Accelerating results with IBM Platform Symphony 3 The benchmark environment 4 Selecting the E.coli model 4 Test methodology 6 Results 6 Interpreting the results 7 The additional benefits of Platform Symphony 7 Limitations and additional work 7 Conclusion 8 Appendix: Shell script for benchmark testing 10 Actual benchmark results captured over three successive comparative runs 11 Hadoop configuration files New approaches to genomic analysis, such as next-generation sequencing, will play key roles in advancing scientific knowledge, facilitating the development of targeted drugs and delivering personalized healthcare. To capitalize on these new approaches, life sciences organizations need computing environments that can process tremendous amounts of data rapidly. Speed of analysis is critical in life sciences since it relates directly to the rate of discovery and the cost-efficiency of employing genomic sequencing for personalized medicine on a large scale. Contrail, a bioinformatics application, leverages the Hadoop MapReduce framework to deliver gains in performance and cost-efficiency in genome sequencing. By combining Contrail with IBM Platform Symphony, a commercial workload scheduler and grid manager, researchers can see even greater advantages. This paper presents the results of recent benchmark testing that demonstrate the advantages of using Platform Symphony in conjunction with Contrail. Addressing the challenges of genome assembly with Contrail Contrail is open-source software that was developed to solve key challenges associated with large-scale genome assembly. It enables de novo assembly of large genomes from short reads, bridging research in computational biology with advances in the Hadoop MapReduce framework. The first step in analyzing a previously un-sequenced organism is to assemble reads by merging similar reads into progressively longer sequences. Assemblers such as Velvet and Euler attempt to solve the assembly problem by constructing, simplifying and traversing a de Bruijn graph of the read sequences. 1 These assemblers primarily focus on correcting errors, reconstructing unambiguous regions and resolving short repeats. While these assemblers can manage small genomes, scaling to larger, mammalian-sized genomes is challenging. The assemblers require constructing and manipulating graphs that are too large to fit in the memory of most computer systems. Larger models can require computing environments with terabytes of memory and building those environments would be too expensive for most institutions. Contrail addresses the memory limitation by re-representing the algorithm to run on a distributed MapReduce framework that avoids the need for massive amounts of memory on any individual system. Contrail relies on Hadoop to iteratively transform an on-disk representation of the assembly graph, allowing an indepth analysis even for large genomes on clusters of commodity computer systems running a Linux operating system.

3 IBM Systems and Technology 3 Accelerating results with IBM Platform Symphony Platform Symphony software offers enterprise-class management of distributed compute and big data applications on a scalable, shared grid. By providing a low-latency scheduling environment for heterogeneous workloads, Platform Symphony can help accelerate application workloads and enable IT groups to enhance the efficiency of how resources are used. Platform Symphony available on its own, and as a limited-use license as part of the IBM InfoSphere BigInsights software distribution also makes it easy for organizations to run applications specifically designed for big data and achieve higher levels of performance to facilitate rapid decision making. With Platform Symphony - Advanced Edition augmenting a supported Hadoop distribution, organizations can run their existing Hadoop MapReduce applications without modification. Platform Symphony does not replace Hadoop; it replaces only the standard batch scheduler included with the open-source Hadoop MapReduce distribution. Platform Symphony enhances Hadoop by providing a faster, low-latency MapReduce runtime layer and more reliable and flexible workload management. In other industries, Platform Symphony has been shown to substantially accelerate Hadoop MapReduce workloads. The goal of this benchmark was to demonstrate how Platform Symphony could deliver similar advantages for a life sciences workload. The benchmark environment This benchmark measured the relative performance of a Contrail model with and without Platform Symphony. Relatively little performance optimization was done for either the Hadoop-only case or the Hadoop plus Platform Symphony cases. Existing lab hardware was used to conduct the tests so the hardware environment may not have been optimal, but it was sufficient for this kind of simple comparative test. Hardware A Hadoop MapReduce cluster comprising multiple IBM rackmount servers (see Figure 1) was used to support the benchmark. The cluster had a single head node and seven data nodes. The head node was a 2.6 GHz IBM System x 3650 M4 server with 32 GB of memory. Six of the compute nodes were IBM System x dx360 M4 servers configured with 64 GB of memory per server and 40 Gbps InfiniBand interconnects. The seventh server was an IBM idataplex M3 server. All nodes were connected through a 40 Gbps InfiniBand switch. The test ran IP over InfiniBand (IPoB). A separate 1 Gb Ethernet network was used for node configuration and management. Mellanox IB switch IBM 3650 M4 server IBM dx360 m4 server IBM dx360 m4 server IBM dx360 m4 server IBM dx360 m4 server IBM dx360 m4 server IBM dx360 m4 server IBM dx360 m3 server Figure 1. IBM System x test environment for Contrail performance comparisons.

4 4 Accelerating life sciences research Figure 2. The Platform Symphony management console cluster view. Software The cluster nodes all ran Red Hat Enterprise Linux 6.2. Hadoop was downloaded from apache.org and configured in accordance with instructions provided in the Platform Symphony release notes (see Figure 2). The Contrail software tested was the latest version available from apps/mediawiki/contrail-bio/index.php?title=contrail as of March The Contrail code was installed based on instructions in the Contrail Quickstart Guide (available on the Contrail wiki). For the comparative test, Platform Symphony Version was used in conjunction with the Hadoop software above. Tests were initially conducted with both Hadoop and 1.1.1, but it was judged to be more valid to focus on since this was the more recent version. A significant difference between the two versions is the heartbeat interval. Hadoop employs a more aggressive 0.3-second heartbeat interval, while Hadoop has a 3-second interval. For this reason, Hadoop generally outperforms Hadoop on small clusters such as the test environment, where a fast heartbeat interval is reasonable. Selecting the E.coli model For this comparative benchmark testing, the Ecoli.10k file included in the data directory of the Contrail distribution was chosen as the basis for the test. 2 The benchmark team treated the E.coli model provided with the Contrail distribution as a black box and ran Contrail in accordance with the provided directions. Test methodology To simplify the benchmark testing, and to facilitate repeated runs with different data models and parameter settings, a shell script was developed (see the Appendix) to run the benchmark. Much of the logic of the script involves parsing the output of the Contrail simulation runs for both Hadoop-only and Hadoop plus Platform Symphony cases to easily capture runtime details from repeated benchmark runs. Without this kind of automation, manually gathering statistics from repeated job runs so that they could be easily compared would have been tedious.

5 IBM Systems and Technology 5 The two test case configurations employed mostly the default settings. The benchmark team did, however, change three variables in the Platform Symphony application profile for the Symphony MapReduce tenant, under which the Contrail jobs ran. The application profiles were configured with these settings: prestartapplication= true tasklowwatermark= 0.0 taskhighwatermark= 1.0 These settings are known to deliver better performance for Hadoop MapReduce workloads and would likely be the same settings used by organizations deploying such an application in production. These are standard settings explained in the Platform Symphony product documentation. The benchmark execution script: Sets up the environment for both Hadoop and Platform Symphony Cleans up the Hadoop Distributed File System (HDFS) environment to make sure there is no data from prior runs Copies the E.coli model files into HDFS Runs the identical model twice once using the Hadoop-only environment and once using the Hadoop plus Platform Symphony environment Following these runs, the output files contrail.out.hadoop and contrail.out.symphony generated by the script were parsed to show comparative runtime statistics. For the Platform Symphony portion of the test, the running jobs could be monitored through the Platform Symphony management console (see Figure 3). Figure 3. View of running Contrail jobs in the Platform Symphony management console.

6 6 Accelerating life sciences research Results Using standard Hadoop 1.1.1, the average duration of each Hadoop MapReduce job was found to be seconds with a total runtime of 873 seconds. Using Hadoop in conjunction with Platform Symphony accelerated the calculation of the Contrail model, reducing the average job runtime to just 4.68 seconds and compressing the total runtime to just 258 seconds almost a 3.5 times performance boost. The captured script output is shown below. Figure 4 shows the relative total runtimes in a bar chart form. Hadoop + Platform Symphony Total jobs: 53 Maximum job length: 124 seconds Average job length: seconds Total duration: 258 seconds Hadoop only Total jobs: 53 Maximum job length: 18 seconds Average job length: seconds Total duration: 873 seconds 1, Contrail runtime to subset of E.coli bacteria (10K reads) Without Platform Symphony With Platform Symphony Total runtime for 53 jobs (seconds) Figure 4. Using Platform Symphony with Hadoop helped significantly reduce the workload s runtime. Interpreting the results While not all models will show similar performance gains, the observations in this test are consistent with a social media benchmark 3 in which Platform Symphony was shown to accelerate workloads by an average 7.3 times. Generally, for latencysensitive applications that involve multiple short-running jobs, Platform Symphony will help improve performance because of its low-latency scheduling architecture. As a result, organizations can either complete work faster or realize cost savings by deploying a smaller cluster environment to attain performance objectives.

7 IBM Systems and Technology 7 The additional benefits of Platform Symphony Even though this effort focuses on comparing performance, Platform Symphony includes capabilities that can provide several additional advantages to life sciences organizations. For example: Proportional resource allocation: Organizations can run multiple MapReduce workloads concurrently, dynamically changing priorities and associated resource allocations in real time. Fast job pre-emption: Organizations can make sure critical workloads start and finish quickly while longer-running workloads continue to run in the background. Job recoverability: JobTracker execution is journaled so that jobs can resume where they left off in the event of failure. Optional IBM General Parallel File System (IBM GPFS ): Organizations running both MapReduce and non-mapreduce workloads can benefit from GPFS since it is a POSIX 4 file system that can support both Hadoop MapReduce and non-mapreduce workloads concurrently accessing file system data, without the need to copy data in and out of the file system. Multi-mode clusters: Organizations running Hadoop MapReduce as well as traditional non-mapreduce workloads can configure individual clusters to support both Platform LSF and Platform Symphony. Platform LSF is a powerful workload management solution for running large, batchoriented workloads. Running both Platform LSF and Platform Symphony on the same cluster can deliver additional flexibility and increase the number of life sciences applications that can efficiently share cluster resources. Limitations and additional work This test involved a single model organizations could experience different results with different models or different numbers of reads. Results may also vary with the size of the cluster. Furthermore, the disk subsystem as configured was suboptimal for both test cases. Organizations might see different results with a more optimized file system configuration. It is debatable whether this specific Contrail test should be described as a big data workload since the actual files involved are relatively small by big data standards. The business advantage of using Hadoop MapReduce for this kind of workload, however, is undeniable. The MapReduce framework helps reduce the costs of performing de novo genome assembly, avoiding the need for costly systems with massive amounts of physical memory. Based on these tests, Platform Symphony builds on the inherent advantages associated with the use of Contrail by providing an additional incremental performance advantage. Conclusion As this testing demonstrates, life sciences organizations using Contrail can expect to see a significant performance advantage by using the Platform Symphony scheduler in place of the standard scheduler included with the Hadoop MapReduce distribution. In the sample model comprising 10,000 reads, Platform Symphony accelerated the calculation of the Contrail result by 3.4 times. Because InfoSphere BigInsights 2.1 incorporates the IBM Platform Symphony scheduler, life sciences organizations considering deploying Hadoop MapReduce workloads along with other existing workloads should consider BigInsights as a platform for their big data applications.

8 8 Accelerating life sciences research Appendix: Shell script for benchmark testing contrail-test.sh This is the script used to control the execution of the benchmark. #!/bin/sh usage() { cat << EOF usage: $0 -i <path> -o <path> [-k <int> -l <prefix>] This section runs Contrail on the input data. OPTIONS -i <path> Path to the HDFS input directory -o <path> Path to the HDFS output directory -k <int> Value of K (default 25) -l <prefix> Local outfile prefix (default contrail.out) EOF } # # Extract total duration from contrail output get_duration() { local dur=`grep Duration: $1 awk { print $3; } ` echo $dur } # # Parse Hadoop contrail output and print statistics parse_hadoop() { local outfile=$1 local jobpattern= job_ local tmpfile= _lengths.tmp local max=0 local tot=0 local num=`grep $jobpattern $outfile wc -l` grep $jobpattern $outfile sed -E s/(.*) ($jobpattern.*)/\2/g awk { print $2; } > $tmpfile local jobs=( $( cat $tmpfile ) ) rm -f $tmpfile for i in ${jobs[@]} do if [ $i -gt $max ] then max=$i fi ((tot=$tot+$i)) done avg=`echo scale=4; $tot/$num bc` echo Hadoop -- Total Jobs: $num echo Max Job Length: $max sec echo Avg Job Length: $avg sec }

9 IBM Systems and Technology 9 # # Parse Symphony contrail output and print statistics parse_symphony() { local outfile=$1 local jobpattern= ^job_ local tmpfile= _lengths.tmp local max=0 for i in ${jobs[@]} do if [ $i -gt $max ] then max=$i fi ((tot=$tot+$i)) done avg=`echo scale=4; $tot/$num bc` echo Symphony -- Total Jobs: $num echo Max Job Length: $max sec echo Avg Job Length: $avg sec } HDFS_INPUT= HDFS_OUTPUT= CONTRAIL_K=25 PREFIX=contrail.out while getopts i:o:k:l: ARG do case $ARG in i) HDFS_INPUT=$OPTARG ;; o) HDFS_OUTPUT=$OPTARG ;; k) CONTRAIL_K=$OPTARG ;; l) PREFIX=$OPTARG ;; esac done SYM_ASMDIR=${HDFS_OUTPUT}.symphony SYM_OUTFILE=${PREFIX}.symphony HADOOP_ASMDIR=${HDFS_OUTPUT}.hadoop HADOOP_OUTFILE=${PREFIX}.hadoop if [[ -z $HDFS_INPUT ]] [[ -z $HDFS_OUTPUT ]] then usage exit fi if [[ -z ${HADOOP_HOME} ]] then echo HADOOP_HOME not defined. exit fi if [[ -z ${PMR_BINDIR} ]] then echo PMR_BINDIR not defined. exit fi echo Cleaning HDFS:${HDFS_INPUT} ${HADOOP_HOME}/bin/hadoop fs -rmr ${HDFS_INPUT}

10 10 Accelerating life sciences research echo Cleaning HDFS:${HADOOP_ASMDIR} ${HADOOP_HOME}/bin/hadoop fs -rmr ${HADOOP_ASMDIR} echo Cleaning HDFS:${SYM_ASMDIR} ${HADOOP_HOME}/bin/hadoop fs -rmr ${SYM_ASMDIR} echo Copying input files to HDFS:${HDFS_INPUT} ${HADOOP_HOME}/bin/hadoop fs -mkdir ${HDFS_INPUT} ${HADOOP_HOME}/bin/hadoop fs -copyfromlocal Ec10k.sim[12].fq ${HDFS_INPUT} # echo Running contrail (K=${CONTRAIL_K}) on Hadoop # echo ======= Redirecting all output to ${HADOOP_OUTFILE} in the current directory # export CONTRAIL_JAR=contrail.jar # ${HADOOP_HOME}/bin/hadoop jar ${CONTRAIL_JAR} contrail.contrail -asm ${HADOOP_ASMDIR} -k ${CONTRAIL_K} -reads ${HDFS_INPUT} &> ${HADOOP_ OUTFILE} # if [ $? -ne 0 ] # then # echo ERROR: Hadoop execution failed. Aborting... # exit 1; # fi echo Running contrail (K=${CONTRAIL_K}) on Symphony echo ======= Redirecting all output to ${SYM_OUTFILE} in the current directory $PMR_BINDIR/mrsh jar ${CONTRAIL_JAR} contrail. Contrail -asm ${SYM_ASMDIR} -k ${CONTRAIL_K} -reads ${HDFS_INPUT} &> ${SYM_OUTFILE} if [ $? -ne 0 ] then echo ERROR: Symphony execution failed. Aborting... exit 1; fi echo parse_symphony ${SYM_OUTFILE} SYMPHONY_DUR=`get_duration ${SYM_OUTFILE}` echo Total Duration: ${SYMPHONY_DUR} sec echo parse_hadoop ${HADOOP_OUTFILE} HADOOP_DUR=`get_duration ${HADOOP_OUTFILE}` echo Total Duration: ${HADOOP_DUR} sec echo SPEEDUP=`echo scale=4; ${HADOOP_DUR}/${SYMPHONY_ DUR} bc` echo Symphony Speedup: ${SPEEDUP}x Actual benchmark results captured over three successive comparative runs Note that the second and third test results were discarded because Platform Symphony performance was substantially better than the Hadoop MapReduce results, likely because of caching effects Platform Symphony can persist services. Symphony Total jobs: 53 Maximum job length: 124 seconds Average job length: seconds Total duration: 258 seconds Hadoop Total jobs: 53 Maximum job length: 18 seconds Average job length: seconds Total duration: 873 seconds Symphony speedup: times Symphony Total jobs: 53 Maximum job length: 4 seconds Average job length: seconds Total duration: 142 seconds

11 IBM Systems and Technology 11 Hadoop Total jobs: 53 Maximum job length: 20 seconds Average job length: seconds Total duration: 871 seconds Symphony speedup: times Symphony Total jobs: 53 Maximum job length: 4 seconds Average job length: seconds Total duration: 142 seconds Hadoop Total jobs: 53 Maximum job length: 20 seconds Average job length: seconds Total duration: 871 seconds Symphony speedup: times Hadoop configuration files core-site.xml <?xml version= 1.0?> <?xml-stylesheet type= text/xsl href= configuration.xsl?> <configuration> <name>hadoop.tmp.dir</name> <value>/hadoop/data</value> <name>fs.default.name</name> <value>hdfs://atsplat2.private:19000/</value> </configuration> hdfs-site.xml <?xml version= 1.0?> <?xml-stylesheet type= text/xsl href= configuration.xsl?> <configuration> <namedfs.replication</name> <value3</value> </configuration> mapred-site.xml <?xml version= 1.0?> <?xml-stylesheet type= text/xsl href= configuration.xsl?> <configuration> <name>mapred.job.tracker</name> <value>atsplat2.private:19001</value> <name>mapred.tasktracker.map.tasks.maximum</name> <value>15</value> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>15</value> <name>mapred.map.child.java.opts</name> <value>-xmx2048m</value> <name>mapred.reduce.child.java.opts</name> <value>-xmx2048m</value> </configuration>

12 For more information To learn more about Contrail, visit: contrail-bio;a=tree For more information about IBM Platform Symphony, visit: ibm.com/platformcomputing/products/symphony For more information about IBM InfoSphere BigInsights and other IBM big data solutions, contact your IBM representative or IBM Business Partner, or visit: ibm.com/software/data/infosphere/biginsights Copyright IBM Corporation 2013 IBM Corporation Systems and Technology Group Route 100 Somers, NY Produced in the United States of America June 2013 IBM, the IBM logo, ibm.com, BigInsights, GPFS, idataplex, InfoSphere, LSF, Platform, and System x are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at Copyright and trademark information at ibm.com/legal/copytrade.shtml Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. This document is current as of the initial date of publication and may be changed by IBM at any time. The performance data discussed herein is presented as derived under specific operating conditions. Actual results may vary. THE INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. Actual available storage capacity may be reported for both uncompressed and compressed data and will vary and may be less than stated. 1 While the science of genome assembly is outside of the scope of this paper, interested parties can learn more about Contrail by visiting: 2 For details about the 10K read E.coli model included with the Contrail software distribution, visit: gitweb.cgi?p=contrail-bio/contrail-bio;a=tree. Groundbreaking work on the E.coli K-12 strain MG1655 was done at the University of Wisconsin. For more information, visit 3 For an audited STAC Report commissioned by IBM, visit: ibm.com/systems/technicalcomputing/platformcomputing/products/ symphony/highperfhadoop.html 4 Portable Operating System Interface for UNIX. See for details. Please Recycle DCW03047USEN-01

Big data management with IBM General Parallel File System

Big data management with IBM General Parallel File System Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers

More information

IBM System x reference architecture solutions for big data

IBM System x reference architecture solutions for big data IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,

More information

The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - -

The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - - The Maui High Performance Computing Center Department of Defense Supercomputing Resource Center (MHPCC DSRC) Hadoop Implementation on Riptide - - Hadoop Implementation on Riptide 2 Table of Contents Executive

More information

IBM Software Hadoop Fundamentals

IBM Software Hadoop Fundamentals Hadoop Fundamentals Unit 2: Hadoop Architecture Copyright IBM Corporation, 2014 US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

More information

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity

More information

Hadoop Basics with InfoSphere BigInsights

Hadoop Basics with InfoSphere BigInsights An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Part: 1 Exploring Hadoop Distributed File System An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government

More information

Hadoop 2.6.0 Setup Walkthrough

Hadoop 2.6.0 Setup Walkthrough Hadoop 2.6.0 Setup Walkthrough This document provides information about working with Hadoop 2.6.0. 1 Setting Up Configuration Files... 2 2 Setting Up The Environment... 2 3 Additional Notes... 3 4 Selecting

More information

Unprecedented Performance and Scalability Demonstrated For Meter Data Management:

Unprecedented Performance and Scalability Demonstrated For Meter Data Management: Unprecedented Performance and Scalability Demonstrated For Meter Data Management: Ten Million Meters Scalable to One Hundred Million Meters For Five Billion Daily Meter Readings Performance testing results

More information

IBM QRadar Security Intelligence Platform appliances

IBM QRadar Security Intelligence Platform appliances IBM QRadar Security Intelligence Platform Comprehensive, state-of-the-art solutions providing next-generation security intelligence Highlights Get integrated log management, security information and event

More information

Continuing the MDM journey

Continuing the MDM journey IBM Software White paper Information Management Continuing the MDM journey Extending from a virtual style to a physical style for master data management 2 Continuing the MDM journey Organizations implement

More information

IBM BigInsights for Apache Hadoop

IBM BigInsights for Apache Hadoop IBM BigInsights for Apache Hadoop Efficiently manage and mine big data for valuable insights Highlights: Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced

More information

The Hartree Centre helps businesses unlock the potential of HPC

The Hartree Centre helps businesses unlock the potential of HPC The Hartree Centre helps businesses unlock the potential of HPC Fostering collaboration and innovation across UK industry with help from IBM Overview The need The Hartree Centre needs leading-edge computing

More information

IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop

IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop Planning a data security and auditing deployment for Hadoop 2 1 2 3 4 5 6 Introduction Architecture Plan Implement Operationalize Conclusion Key requirements for detecting data breaches and addressing

More information

Hadoop Basics with InfoSphere BigInsights

Hadoop Basics with InfoSphere BigInsights An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Unit 4: Hadoop Administration An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government Users Restricted

More information

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform: Creating an Integrated, Optimized, and Secure Enterprise Data Platform: IBM PureData System for Transactions with SafeNet s ProtectDB and DataSecure Table of contents 1. Data, Data, Everywhere... 3 2.

More information

IBM Endpoint Manager for Server Automation

IBM Endpoint Manager for Server Automation IBM Endpoint Manager for Server Automation Leverage advanced server automation capabilities with proven Endpoint Manager benefits Highlights Manage the lifecycle of all endpoints and their configurations

More information

IBM PureApplication System for IBM WebSphere Application Server workloads

IBM PureApplication System for IBM WebSphere Application Server workloads IBM PureApplication System for IBM WebSphere Application Server workloads Use IBM PureApplication System with its built-in IBM WebSphere Application Server to optimally deploy and run critical applications

More information

Dell Reference Configuration for Hortonworks Data Platform

Dell Reference Configuration for Hortonworks Data Platform Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution

More information

The business value of improved backup and recovery

The business value of improved backup and recovery IBM Software Thought Leadership White Paper January 2013 The business value of improved backup and recovery The IBM Butterfly Analysis Engine uses empirical data to support better business results 2 The

More information

Easily deploy and move enterprise applications in the cloud

Easily deploy and move enterprise applications in the cloud Easily deploy and move enterprise applications in the cloud IBM PureApplication solutions offer a simple way to implement a dynamic hybrid cloud environment 2 Easily deploy and move enterprise applications

More information

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5

More information

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW A high-performance solution based on IBM DB2 with BLU Acceleration Highlights Help reduce costs by moving infrequently used to cost-effective systems

More information

Optimize workloads to achieve success with cloud and big data

Optimize workloads to achieve success with cloud and big data IBM Software Thought Leadership White Paper December 2012 Optimize workloads to achieve success with cloud and big data Intelligent, integrated, cloud-enabled workload automation can improve agility and

More information

IBM InfoSphere BigInsights Enterprise Edition

IBM InfoSphere BigInsights Enterprise Edition IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade

More information

IBM FlashSystem and Atlantis ILIO

IBM FlashSystem and Atlantis ILIO IBM FlashSystem and Atlantis ILIO Cost-effective, high performance, and scalable VDI Highlights Lower-than-PC cost Better-than-PC user experience Lower project risks Fast provisioning and better management

More information

IBM Storwize V5000. Designed to drive innovation and greater flexibility with a hybrid storage solution. Highlights. IBM Systems Data Sheet

IBM Storwize V5000. Designed to drive innovation and greater flexibility with a hybrid storage solution. Highlights. IBM Systems Data Sheet IBM Storwize V5000 Designed to drive innovation and greater flexibility with a hybrid storage solution Highlights Customize your storage system with flexible software and hardware options Boost performance

More information

IBM Enterprise Linux Server

IBM Enterprise Linux Server IBM Systems and Technology Group February 2011 IBM Enterprise Linux Server Impressive simplification with leading scalability, high availability and security Table of Contents Executive Summary...2 Our

More information

Simplify security management in the cloud

Simplify security management in the cloud Simplify security management in the cloud IBM Endpoint Manager and IBM SmartCloud offerings provide complete cloud protection Highlights Ensure security of new cloud services by employing scalable, optimized

More information

The IBM Cognos Platform

The IBM Cognos Platform The IBM Cognos Platform Deliver complete, consistent, timely information to all your users, with cost-effective scale Highlights Reach all your information reliably and quickly Deliver a complete, consistent

More information

IBM InfoSphere Optim Test Data Management

IBM InfoSphere Optim Test Data Management IBM InfoSphere Optim Test Data Management Highlights Create referentially intact, right-sized test databases or data warehouses Automate test result comparisons to identify hidden errors and correct defects

More information

High Performance Computing Cloud Offerings from IBM Technical Computing IBM Redbooks Solution Guide

High Performance Computing Cloud Offerings from IBM Technical Computing IBM Redbooks Solution Guide High Performance Computing Cloud Offerings from IBM Technical Computing IBM Redbooks Solution Guide The extraordinary demands that engineering, scientific, and research organizations place upon big data

More information

Big Data and Natural Language: Extracting Insight From Text

Big Data and Natural Language: Extracting Insight From Text An Oracle White Paper October 2012 Big Data and Natural Language: Extracting Insight From Text Table of Contents Executive Overview... 3 Introduction... 3 Oracle Big Data Appliance... 4 Synthesys... 5

More information

IBM Tivoli Storage Manager for Virtual Environments

IBM Tivoli Storage Manager for Virtual Environments IBM Storage Manager for Virtual Environments Non-disruptive backup and instant recovery: Simplified and streamlined Highlights Simplify management of the backup and restore process for virtual machines

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Zend and IBM: Bringing the power of PHP applications to the enterprise

Zend and IBM: Bringing the power of PHP applications to the enterprise Zend and IBM: Bringing the power of PHP applications to the enterprise A high-performance PHP platform that helps enterprises improve and accelerate web and mobile application development Highlights: Leverages

More information

IBM SmartCloud Workload Automation

IBM SmartCloud Workload Automation IBM SmartCloud Workload Automation Highly scalable, fault-tolerant solution offers simplicity, automation and cloud integration Highlights Gain visibility into and manage hundreds of thousands of jobs

More information

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Scalable Cloud Computing Solutions for Next Generation Sequencing Data Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of

More information

IBM Storwize V7000 Unified and Storwize V7000 storage systems

IBM Storwize V7000 Unified and Storwize V7000 storage systems IBM Storwize V7000 Unified and Storwize V7000 storage systems Transforming the economics of data storage Highlights Meet changing business needs with virtualized, enterprise-class, flashoptimized modular

More information

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse IBM Analytics Just the facts: Four critical concepts for planning the logical data warehouse 1 2 3 4 5 6 Introduction Complexity Speed is businessfriendly Cost reduction is crucial Analytics: The key to

More information

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads A Competitive Test and Evaluation Report

More information

IBM Unstructured Data Identification and Management

IBM Unstructured Data Identification and Management IBM Unstructured Data Identification and Management Discover, recognize, and act on unstructured data in-place Highlights Identify data in place that is relevant for legal collections or regulatory retention.

More information

IBM Tivoli Storage Manager Suite for Unified Recovery

IBM Tivoli Storage Manager Suite for Unified Recovery IBM Tivoli Storage Manager Suite for Unified Recovery Comprehensive data protection software with a broad choice of licensing plans Highlights Optimize data protection for virtual servers, core applications

More information

IBM Software Cloud service delivery and management

IBM Software Cloud service delivery and management IBM Software Cloud service delivery and management Rethink IT. Reinvent business. 2 Cloud service delivery and management Virtually unparalleled change and complexity On this increasingly instrumented,

More information

Move beyond monitoring to holistic management of application performance

Move beyond monitoring to holistic management of application performance Move beyond monitoring to holistic management of application performance IBM SmartCloud Application Performance Management: Actionable insights to minimize issues Highlights Manage critical applications

More information

IBM SmartCloud Monitoring

IBM SmartCloud Monitoring IBM SmartCloud Monitoring Gain greater visibility and optimize virtual and cloud infrastructure Highlights Enhance visibility into cloud infrastructure performance Seamlessly drill down from holistic cloud

More information

Taking control of the virtual image lifecycle process

Taking control of the virtual image lifecycle process IBM Software Thought Leadership White Paper March 2012 Taking control of the virtual image lifecycle process Putting virtual images to work for you 2 Taking control of the virtual image lifecycle process

More information

Cloud storage is strategically inevitable

Cloud storage is strategically inevitable Cloud storage is strategically inevitable IBM can help in preparing for a successful cloud storage deployment Highlights Use cloud technology to enable speed and innovation by empowering users and communities

More information

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop Source Alessandro Rezzani, Big Data - Architettura, tecnologie e metodi per l utilizzo di grandi basi di dati, Apogeo Education, ottobre 2013 wikipedia Hadoop Apache Hadoop is an open-source software

More information

Build more and grow more with Cloudant DBaaS

Build more and grow more with Cloudant DBaaS IBM Software Brochure Build more and grow more with Cloudant DBaaS Next generation data management designed for Web, mobile, and the Internet of things Build more and grow more with Cloudant DBaaS New

More information

Using the cloud to improve business resilience

Using the cloud to improve business resilience IBM Global Technology Services White Paper IBM Business Continuity and Resiliency Services Using the cloud to improve business resilience Choose the right managed services provider to limit reputational

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

z/os V1R11 Communications Server system management and monitoring

z/os V1R11 Communications Server system management and monitoring IBM Software Group Enterprise Networking Solutions z/os V1R11 Communications Server z/os V1R11 Communications Server system management and monitoring z/os Communications Server Development, Raleigh, North

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Integrated Grid Solutions. and Greenplum

Integrated Grid Solutions. and Greenplum EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving

More information

How To Test The Performance Of An Ass 9.4 And Sas 7.4 On A Test On A Powerpoint Powerpoint 9.2 (Powerpoint) On A Microsoft Powerpoint 8.4 (Powerprobe) (

How To Test The Performance Of An Ass 9.4 And Sas 7.4 On A Test On A Powerpoint Powerpoint 9.2 (Powerpoint) On A Microsoft Powerpoint 8.4 (Powerprobe) ( White Paper Revolution R Enterprise: Faster Than SAS Benchmarking Results by Thomas W. Dinsmore and Derek McCrae Norton In analytics, speed matters. How much? We asked the director of analytics from a

More information

A financial software company

A financial software company A financial software company Projecting USD10 million revenue lift with the IBM Netezza data warehouse appliance Overview The need A financial software company sought to analyze customer engagements to

More information

IBM Software Integrating and governing big data

IBM Software Integrating and governing big data IBM Software big data Does big data spell big trouble for integration? Not if you follow these best practices 1 2 3 4 5 Introduction Integration and governance requirements Best practices: Integrating

More information

SAS deployment on IBM Power servers with IBM PowerVM dedicated-donating LPARs

SAS deployment on IBM Power servers with IBM PowerVM dedicated-donating LPARs SAS deployment on IBM Power servers with IBM PowerVM dedicated-donating LPARs Narayana Pattipati IBM Systems and Technology Group ISV Enablement January 2013 Table of contents Abstract... 1 IBM PowerVM

More information

IBM Tivoli Netcool Configuration Manager

IBM Tivoli Netcool Configuration Manager IBM Netcool Configuration Manager Improve organizational management and control of multivendor networks Highlights Automate time-consuming device configuration and change management tasks Effectively manage

More information

IBM BladeCenter S Big benefits for the small office

IBM BladeCenter S Big benefits for the small office IBM BladeCenter S Big benefits for the small office Highlights All in one integrates servers, SAN storage, networking and I/O into a single chassis No special wiring needed uses standard office power plugs

More information

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

IBM Storwize V7000: For your VMware virtual infrastructure

IBM Storwize V7000: For your VMware virtual infrastructure IBM Storwize V7000: For your VMware virtual infrastructure Innovative midrange disk system leverages integrated storage technologies Highlights Complement server virtualization, extending cost savings

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September 2014. National Institute of Standards and Technology (NIST)

NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September 2014. National Institute of Standards and Technology (NIST) NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop September 2014 Dylan Yaga NIST/ITL CSD Lead Software Designer Fernando Podio NIST/ITL CSD Project Manager National Institute of Standards

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

IBM Cognos Enterprise: Powerful and scalable business intelligence and performance management

IBM Cognos Enterprise: Powerful and scalable business intelligence and performance management : Powerful and scalable business intelligence and performance management Highlights Arm every user with the analytics they need to act Support the way that users want to work with their analytics Meet

More information

Delivering new insights and value to consumer products companies through big data

Delivering new insights and value to consumer products companies through big data IBM Software White Paper Consumer Products Delivering new insights and value to consumer products companies through big data 2 Delivering new insights and value to consumer products companies through big

More information

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP selects SAP HANA to improve the speed of business analytics with IBM and SAP Founded in 1806, is a global consumer products company which sells nearly $17 billion annually in personal care, home care,

More information

Big Data Evaluator 2.1: User Guide

Big Data Evaluator 2.1: User Guide University of A Coruña Computer Architecture Group Big Data Evaluator 2.1: User Guide Authors: Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño May 5, 2016 Contents 1 Overview 3

More information

Hadoop Cluster Applications

Hadoop Cluster Applications Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday

More information

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters CONNECT - Lab Guide Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters Hardware, software and configuration steps needed to deploy Apache Hadoop 2.4.1 with the Emulex family

More information

IBM PureFlex System. The infrastructure system with integrated expertise

IBM PureFlex System. The infrastructure system with integrated expertise IBM PureFlex System The infrastructure system with integrated expertise 2 IBM PureFlex System IT is moving to the strategic center of business Over the last 100 years information technology has moved from

More information

HADOOP CLUSTER SETUP GUIDE:

HADOOP CLUSTER SETUP GUIDE: HADOOP CLUSTER SETUP GUIDE: Passwordless SSH Sessions: Before we start our installation, we have to ensure that passwordless SSH Login is possible to any of the Linux machines of CS120. In order to do

More information

Fiserv. Saving USD8 million in five years and helping banks improve business outcomes using IBM technology. Overview. IBM Software Smarter Computing

Fiserv. Saving USD8 million in five years and helping banks improve business outcomes using IBM technology. Overview. IBM Software Smarter Computing Fiserv Saving USD8 million in five years and helping banks improve business outcomes using IBM technology Overview The need Small and midsize banks and credit unions seek to attract, retain and grow profitable

More information

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data

More information

ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK

ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK KEY FEATURES PROVISION FROM BARE- METAL TO PRODUCTION QUICKLY AND EFFICIENTLY Controlled discovery with active control of your hardware Automatically

More information

IBM Tivoli Storage FlashCopy Manager

IBM Tivoli Storage FlashCopy Manager IBM Storage FlashCopy Manager Online, near-instant snapshot backup and restore of critical business applications Highlights Perform near-instant application-aware snapshot backup and restore, with minimal

More information

IBM Tivoli Endpoint Manager for Lifecycle Management

IBM Tivoli Endpoint Manager for Lifecycle Management IBM Endpoint Manager for Lifecycle Management A single-agent, single-console approach for endpoint management across the enterprise Highlights Manage hundreds of thousands of endpoints regardless of location,

More information

Safeguarding the cloud with IBM Dynamic Cloud Security

Safeguarding the cloud with IBM Dynamic Cloud Security Safeguarding the cloud with IBM Dynamic Cloud Security Maintain visibility and control with proven security solutions for public, private and hybrid clouds Highlights Extend enterprise-class security from

More information

High Availability of the Polarion Server

High Availability of the Polarion Server Polarion Software CONCEPT High Availability of the Polarion Server Installing Polarion in a high availability environment Europe, Middle-East, Africa: Polarion Software GmbH Hedelfinger Straße 60 70327

More information

Reducing the cost and complexity of endpoint management

Reducing the cost and complexity of endpoint management IBM Software Thought Leadership White Paper October 2014 Reducing the cost and complexity of endpoint management Discover how midsized organizations can improve endpoint security, patch compliance and

More information

Premier. Helping healthcare providers deliver the best possible care to their patients. Smart is...

Premier. Helping healthcare providers deliver the best possible care to their patients. Smart is... Premier Helping healthcare providers deliver the best possible care to their patients Smart is... Sharing and analyzing healthcare information to help physicians identify the best treatments for their

More information

IBM Storwize Rapid Application Storage solutions

IBM Storwize Rapid Application Storage solutions IBM Storwize Rapid Application Storage solutions Efficient, integrated, pretested and powerful solutions to accelerate deployment and return on investment. Highlights Improve disk utilization by up to

More information

Boosting enterprise security with integrated log management

Boosting enterprise security with integrated log management IBM Software Thought Leadership White Paper May 2013 Boosting enterprise security with integrated log management Reduce security risks and improve compliance across diverse IT environments 2 Boosting enterprise

More information

IBM WebSphere Application Server Family

IBM WebSphere Application Server Family IBM IBM Family Providing the right application foundation to meet your business needs Highlights Build a strong foundation and reduce costs with the right application server for your business needs Increase

More information

IBM InfoSphere Information Server Ready to Launch for SAP Applications

IBM InfoSphere Information Server Ready to Launch for SAP Applications IBM Information Server Ready to Launch for SAP Applications Drive greater business value and help reduce risk for SAP consolidations Highlights Provides a complete solution that couples data migration

More information

Turbo-Charging Open Source Hadoop for Faster, more Meaningful Insights

Turbo-Charging Open Source Hadoop for Faster, more Meaningful Insights Turbo-Charging Open Source Hadoop for Faster, more Meaningful Insights Gord Sissons Senior Manager, Technical Marketing IM Platform Computing gsissons@ca.ibm.com Agenda Some Context IM Platform Computing

More information

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse Information management software solutions White paper Powerful data warehousing performance with IBM Red Brick Warehouse April 2004 Page 1 Contents 1 Data warehousing for the masses 2 Single step load

More information

DeIC Watson Agreement - hvad betyder den for DeIC medlemmerne

DeIC Watson Agreement - hvad betyder den for DeIC medlemmerne DeIC Watson Agreement - hvad betyder den for DeIC medlemmerne Preben Jacobsen Solution Architect Nordic Lead, Software Defined Infrastructure Group IBM Danmark 2014 IBM Corporation Link: https://www.youtube.com/watch?v=_xcmh1lqb9i

More information

Improving Grid Processing Efficiency through Compute-Data Confluence

Improving Grid Processing Efficiency through Compute-Data Confluence Solution Brief GemFire* Symphony* Intel Xeon processor Improving Grid Processing Efficiency through Compute-Data Confluence A benchmark report featuring GemStone Systems, Intel Corporation and Platform

More information

IBM RATIONAL PERFORMANCE TESTER

IBM RATIONAL PERFORMANCE TESTER IBM RATIONAL PERFORMANCE TESTER Today, a major portion of newly developed enterprise applications is based on Internet connectivity of a geographically distributed work force that all need on-line access

More information

Effective Storage Management for Cloud Computing

Effective Storage Management for Cloud Computing IBM Software April 2010 Effective Management for Cloud Computing April 2010 smarter storage management Page 1 Page 2 EFFECTIVE STORAGE MANAGEMENT FOR CLOUD COMPUTING Contents: Introduction 3 Cloud Configurations

More information

A Survey of Shared File Systems

A Survey of Shared File Systems Technical Paper A Survey of Shared File Systems Determining the Best Choice for your Distributed Applications A Survey of Shared File Systems A Survey of Shared File Systems Table of Contents Introduction...

More information