Scaling from 1 PC to a super computer using Mascot



Similar documents
High Throughput Proteomics

IncidentMonitor Server Specification Datasheet

Sockets, cores and threads: Choosing PC hardware for Mascot

Learning Objectives. Chapter 1: Networking with Microsoft Windows 2000 Server. Basic Network Concepts. Learning Objectives (continued)

Dragon Medical Enterprise Network Edition Technical Note: Requirements for DMENE Networks with virtual servers

Terminal Server Software and Hardware Requirements. Terminal Server. Software and Hardware Requirements. Datacolor Match Pigment Datacolor Tools

Hardware/Software Guidelines

IOS110. Virtualization 5/27/2014 1

Oracle Database Scalability in VMware ESX VMware ESX 3.5

How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)

Enterprise Planning Large Scale ARGUS Enterprise /29/2015 ARGUS Software An Altus Group Company

Windows Server Performance Monitoring

SERVER CLUSTERING TECHNOLOGY & CONCEPT

LabStats 5 System Requirements

Imaging Computing Server User Guide

Cloud computing is a marketing term that means different things to different people. In this presentation, we look at the pros and cons of using

3 - Introduction to Operating Systems

MS SQL Installation Guide

Workstation Virtualization Software Review. Matthew Smith. Office of Science, Faculty and Student Team (FaST) Big Bend Community College

Seradex White Paper. Focus on these points for optimizing the performance of a Seradex ERP SQL database:

Clusters: Mainstream Technology for CAE

PRIMERGY server-based High Performance Computing solutions

Professional and Enterprise Edition. Hardware Requirements

Best Practices for Virtualised SharePoint

MedInformatix System Requirements

Intro to Virtualization

Practical issues in DIY RAID Recovery

Parallels Virtuozzo Containers

Tekla Structures 18 Hardware Recommendation

A Comparison of VMware and {Virtual Server}

Server Scalability and High Availability

Enterprise Edition Technology Overview

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

System Requirements Date 2014/09/22

CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content

4.1 Introduction 4.2 Explain the purpose of an operating system Describe characteristics of modern operating systems Control Hardware Access

Selling Virtual Private Servers. A guide to positioning and selling VPS to your customers with Heart Internet

Test Object. IO Drivers

Distribution One Server Requirements

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

HP reference configuration for entry-level SAS Grid Manager solutions

wu.cloud: Insights Gained from Operating a Private Cloud System

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

Achieving business benefits through automated software testing. By Dr. Mike Bartley, Founder and CEO, TVS

Network Station - Thin Client Computing - Overview

Parallels Desktop 4 for Windows and Linux Read Me

Running Windows on a Mac. Why?


Larger, active workgroups (or workgroups with large databases) must use one of the full editions of SQL Server.

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Xactimate v.27 Network Installation

ZCP 7.0 (build 41322) Zarafa Collaboration Platform. Zarafa Archiver Deployment Guide

Proval LS Database & Client Software (Trial or Full) Installation Guide

Enterprise Edition. Hardware Requirements

Virtualization of CBORD Odyssey PCS and Micros 3700 servers. The CBORD Group, Inc. January 13, 2007

Analysis and Implementation of Cluster Computing Using Linux Operating System

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

Improved LS-DYNA Performance on Sun Servers

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

An Introduction to High Performance Computing in the Department

Adonis Technical Requirements

vrealize Business System Requirements Guide

Dragon NaturallySpeaking and citrix. A White Paper from Nuance Communications March 2009

Functions of NOS Overview of NOS Characteristics Differences Between PC and a NOS Multiuser, Multitasking, and Multiprocessor Systems NOS Server

Recommended hardware system configurations for ANSYS users

Vocera Voice 4.3 and 4.4 Server Sizing Matrix

Time Zone Sensitive. Query Based Dialing. SPD PRO with MySQL allows you to dial by customized query such as area code, zip code, age, and much more!

Server Virtualization with VMWare

System Environment Specifications Network, PC, Peripheral & Server Requirements

Hardware Requirements

NIVEO Network Attached Storage Series NNAS-D5 NNAS-R4. More information:

VMWARE WHITE PAPER 1

NEC Corporation of America Intro to High Availability / Fault Tolerant Solutions

Network operating systems typically are used to run computers that act as servers. They provide the capabilities required for network operation.

Chapter 7A. Functions of Operating Systems. Types of Operating Systems. Operating System Basics

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Simplify Data Management and Reduce Storage Costs with File Virtualization

DB2 Connect for NT and the Microsoft Windows NT Load Balancing Service

Windows Compute Cluster Server Miron Krokhmal CTO

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

The Trouble with Backups

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

nexsan NAS just got faster, easier and more affordable.

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Virtual Server and Storage Provisioning Service. Service Description

Virtualization: What does it mean for SAS? Karl Fisher and Clarke Thacher, SAS Institute Inc., Cary, NC

Business Storage 4-Bay Rackmount NAS

Contents Introduction... 5 Deployment Considerations... 9 Deployment Architectures... 11

Softline VIP Payroll System Requirements v2.9a January 2010

Planning Your Installation or Upgrade

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

IBM TotalStorage Network Attached Storage 100

Cloud Computing through Virtualization and HPC technologies

Module 5 Introduction to Processes and Controls

Hardware and Software Requirements for Installing California.pro

Introduction to Network Management

Minimum Hardware Specifications Upgrades

ABB Technology Days Fall 2013 System 800xA Server and Client Virtualization. ABB Inc 3BSE en. October 29, 2013 Slide 1

Adapt Support Managed Service Programs

A+ Guide to Software: Managing, Maintaining, and Troubleshooting, 5e. Chapter 3 Installing Windows

Transcription:

Scaling from 1 PC to a super computer using Mascot 1

Mascot - options for high throughput When is more processing power required? Options: Multiple Mascot installations Using a large Unix server Clusters - hardware and software configurations In this session I am going to talk about the three main options available if you need higher throughput for Mascot searches than is available on a standard PC. So firstly I will describe how to determine if you need more processing power and if so how much. Then I will discuss the three options: - Multiple installations, which we don t generally recommend - using a large Unix server - using a cluster 2

When is more processing power required For high throughput and quick response For large no-enzyme searches, or searches against a number of very large databases from a single instrument For multiple instruments So firstly, why might you need more processing power. You may think that this is all very obvious, but I can assure you that we get a fairly equal number of requests for quotes for overpowered and underpowered systems. The main reasons are listed here: Clearly if you have over 50 instruments running in a high throughput environment like... 3

GeneProt, then a single PC running Windows is not going to be up to the job. 4

Worst case conditions Large sequence database human genome dbest Large MS dataset 10,000-100,000 spectra Inaccurate / noisy data Large numbers of P-T modifications No enzyme specificity So what are the worst case conditions - The human genome has approximately 3 Gigabases, and the complete dbest is now over twice that size. Searching these databases is clearly going to be extremely computationally intensive. However, even a single instrument running a MUDPIT experiment can produce in excess of 10,000 spectra. Inaccurate or noisy data can mean using large tolerances, and this will affect the search speed. Large numbers of post translational or chemical modifications (selected as variable modification) will also increase the search speed. There is no time penalty for fixed modifications. Lastly, a search with no enzyme specificity will dramatically increase search times. 5

How many CPUs do I need? It is possible to make a reasonably accurate estimate if you have enough information about the types of search you want to perform... Search speed is almost exactly proportional to the number of CPUs and CPU speed Try a search on Matrix Science public web site at weekend Send us data and ask for timing - please ask first! Estimating the number of CPU's required to achieve a given throughput is complex. One option is to try typical searches on our web site. 6

Using multiple Mascot installations One Mascot server per instrument but More administration required - e.g. for database updates, backups of data, configuration changes etc. Cannot perform a very large search rapidly Harder to find old results files - need to remember which Mascot server was used So the first, and most obvious solution is to have multiple Mascot installations - for example one per instrument, or one per user. However, we would generally recommend against this because 7

Windows / Linux SMP systems Windows 2000 Professional, XP and NT 4 Workstation support 2 processors Windows 2000 / NT4 Server for > 2 processors Linux smp 4 processor Intel Box expensive - e.g $30k for a 4 processor 1.6 GHz PowerEdge 6650 2Gb memory limit... For a simple Intel or Athlon based PC, the maximum number of processors that we would recommend is 4. With Windows 2000 professional, XP and NT4, the limit is 2 processors. 8

The big Unix Box solution So then we have the big Unix Box solution.the four main manufacturers are shown here, and each system has its merits. The advantage of these systems is that they really can scale.... 9

And what s more, they can now easily be purchased over the internet 10

Anybody got a credit card I can borrow! 11

32 Processor Unix Server Sequences per second 16000 14000 12000 10000 8000 6000 4000 2000 0 Actual performance Perfect scaling 0 5 10 15 20 25 30 35 Number of processors This graph shows how well one of these systems can scale. This graph is for a 32 processors Unix box, and the yellow line is showing perfect scaling. Once we get up to about 32 processors, it starts to tail off a little. 12

Advantages of a large Unix system Often company fasta files compiled on a large Unix box Robust / excellent support / well maintained Some systems can scale to more processors In house support from IT / Bioinformatics Large amount of memory Easier to maintain and support than a cluster 13

Disadvantages of a large Unix box Older systems can be slow - 2 year lifetime Price 14

A cheaper way? Best value for money computers are dual processor systems A cluster of single or dual processor computers is a cost effective way of increasing throughput A cluster of standard PC's is a very cost effective and scalable solution 15

TCP/IP "External" Intranet TCP/IP Mascot master 100Base-T Hub Mascot Cluster Topology TCP/IP TCP/IP TCP/IP Mascot node 1 Mascot node 2 Mascot node 3 Mascot supports cluster operation using a Beowulf-like topology. Mascot supports cluster mode as standard, whenever the licence is for 4 CPU's or more. You just have to hook up an appropriate number of PC's on a local LAN 16

Matrix Science Web server 19 rack-mountable chassis containing 10 fast Intel processors Redundant power supplies We run Windows on the nodes, and the Master computer runs FreeBSD 2Gb RAM per node Hot-swappable Similar modules available for sale as a turn-key package Alternatively, we can supply a complete turn-key solution based on rack mounted, high-reliability PC hardware. Our web server uses this hardware 17

Software setup - Windows Install Windows 2000 workstation on each node Install Windows 2000 server on master Run Mascot installation: Setup under Windows is extremely simple 18

Cluster software setup - Windows If the licence is for 4 CPU's or more, you will be presented with additional dialogs for configuring the cluster nodes 19

Cluster software setup - Unix Install Unix version of Mascot Edit nodelist.txt and a couple of scripts Applications and configuration files are all transferred to each of the nodes automatically Under Unix, its just a case of editing two configuration files 20

Functionality of Mascot cluster Software automatically installed onto each node - no need to pre-load each system Sequence databases and all configuration files distributed / updated automatically Each search runs on every node so there is an (almost) linear increase in search speed Heart beat health check Automatic recovery if a node fails Spare nodes are supported Distributed memory System errors emailed to administrator Cluster support includes many features to support large, high throughput systems 21

Status report "At a glance" status screen 22

What are the limits... Less than 4 processors makes little sense Limit of 1024 CPUs per sub-cluster With Mascot version 1.8, practical limit of about 100 dual processor 32 bit nodes Limit of 4096 nodes in total Of course, there have to be some limits 23

GeneProt have the largest Mascot cluster. 1440 Alpha CPU's configured as a number of sub-clusters 24

25