Biomile Network - Design and Implementation



Similar documents
MX Data (& Sample) Handling (& Tracking) at the ESRF Gordon Leonard ESRF Macromolecular Crystallography Group

Next Generation Operating Systems

Beamline Automation at the APS

Development of a Standardized Data-Backup System for Protein Crystallography (PX-DBS) Michael Hellmig, BESSY GmbH

Avid ISIS

Managing Crystallographic Data in Facilities Using Integrated CIF, HDF5 and NeXus Herbert J. Bernstein Dowling College, Shirley, NY, USA

Real Time Analysis of Advanced Photon Source Data

CHESS DAQ* Introduction

Logically a Linux cluster looks something like the following: Compute Nodes. user Head node. network

Keys to Optimizing Your Backup Environment: Tivoli Storage Manager

BIG data big problems big opportunities Rudolf Dimper Head of Technical Infrastructure Division ESRF

Scalable Multi-Node Event Logging System for Ba Bar

Fibre Forward - Why Storage Infrastructures Should Be Built With Fibre Channel

Protect Data... in the Cloud

2009 Oracle Corporation 1

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Data Protection. the data. short retention. event of a disaster. - Different mechanisms, products for backup and restore based on retention and age of

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

Paragon Protect & Restore

White paper: Unlocking the potential of load testing to maximise ROI and reduce risk.

GeoImaging Accelerator Pansharp Test Results

Legal Notices Introduction... 3

Distribution One Server Requirements

Exadata Database Machine

Windows Server Performance Monitoring

Eliminating End User and Application Downtime:

DESKTOP VIRTUALIZATION SIZING AND COST MODELING PROCESSOR MEMORY STORAGE NETWORK

Building a Linux Cluster

IBM Virtualization Engine TS7700 GRID Solutions for Business Continuity

Scala Storage Scale-Out Clustered Storage White Paper

Security Systems. Technical Note. iscsi sessions if VRM Load Balancing in All Mode is used Version 1.0 STVC/PRM. Page 1 of 6

Data Needs for LCLS-II

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Quantum StorNext. Product Brief: Distributed LAN Client

Experiences with the SER-CAT Remote User Participation Program

STORAGETEK VIRTUAL STORAGE MANAGER SYSTEM

Configuration Maximums VMware Infrastructure 3

Control Software at ESRF beamlines

Parallel Large-Scale Visualization

SUN ORACLE EXADATA STORAGE SERVER

CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY

Merge Healthcare Virtualization

Lustre Networking BY PETER J. BRAAM

Overview of the GRETINA Auxiliary Detector Interface

Recommended hardware system configurations for ANSYS users

Where IT perceptions are reality. Industry Brief. Ride the Next Big Wave of Storage Networking. Featured Technology

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP

VERITAS Business Solutions. for DB2

Basler. Area Scan Cameras

Scalable. Reliable. Flexible. High Performance Architecture. Fault Tolerant System Design. Expansion Options for Unique Business Needs

VANGUARD ONLINE BACKUP

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Nutanix Tech Note. Configuration Best Practices for Nutanix Storage with VMware vsphere

Enterprise Architectures for Large Tiled Basemap Projects. Tommy Fauvell

Main Memory Data Warehouses

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Solid State Drive Architecture

Oracle Database In-Memory The Next Big Thing

Future-Proofed Backup For A Virtualized World!

Virtual Server Hosting Service Definition. SD021 v1.8 Issue Date 20 December 10

Ultra-Scalable Storage Provides Low Cost Virtualization Solutions

DELL s Oracle Database Advisor

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser

PRODUCTS & TECHNOLOGY

IBM Deep Computing Visualization Offering

Oracle Exadata Database Machine Aké jednoznačné výhody prináša pre finančné inštitúcie

760 Veterans Circle, Warminster, PA Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA

HP Z Turbo Drive PCIe SSD

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

WHITE PAPER BRENT WELCH NOVEMBER

Event Logging and Distribution for the BaBar Online System

Observer Probe Family

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International

Vendor Questions Infrastructure Products and Services RFP #

How To Design A Data Centre

Terminal Server Software and Hardware Requirements. Terminal Server. Software and Hardware Requirements. Datacolor Match Pigment Datacolor Tools

White Paper. Recording Server Virtualization

Dell s SAP HANA Appliance

Turnkey Deduplication Solution for the Enterprise

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

A Performance Analysis of Distributed Indexing using Terrier

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

USER PROGRAMMABLE AUTHORIZE USERS AT THE CLICK OF A MOUSE AUDIT TRAIL AND TIME ZONE CONTROL LOCK SYSTEM MANAGEMENT SOFTWARE:

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

ioscale: The Holy Grail for Hyperscale

PLANNING FOR DENSITY AND PERFORMANCE IN VDI WITH NVIDIA GRID JASON SOUTHERN SENIOR SOLUTIONS ARCHITECT FOR NVIDIA GRID

June Blade.org 2009 ALL RIGHTS RESERVED

Massive SAN Backups. HP World 2002 Los Angeles, CA September, 2002 Denys Beauchemin. The Innovators. We watch your data

FLASH STORAGE SOLUTION

Inge Os Sales Consulting Manager Oracle Norway

Scalable. Reliable. Flexible. High Performance Architecture. Fault Tolerant System Design. Expansion Options for Unique Business Needs

Make the Most of Big Data to Drive Innovation Through Reseach

Backup and Recovery Redesign with Deduplication

How To Handle Big Data With A Data Scientist

Understanding the Benefits of IBM SPSS Statistics Server

Tandberg Data AccuVault RDX

Maximizing Your Server Memory and Storage Investments with Windows Server 2012 R2

OBSERVEIT DEPLOYMENT SIZING GUIDE

Transcription:

Likely Evolution of Computational Infrastructure for Bio Beam Lines Over the Next Five Years Dieter K. Schneider BNL Biology and the PXRR at the NSLS BNL, April 21, 2010 with expert advise from: James Pflugrath, Rigaku USA, The Woodlands, TX Clemens Vonrhein, Global Phasing Ltd, Cambridge, UK Chris Nielson, Area Detector Systems Corporation, Poway, CA John Skinner, PXRR, BNL Biology Matt Cowan, PXRR, BNL Biology PXRR: Robert Sweet Annie Héroux Allen Orville Howard Robinson Alexei Soares Deborah Stoner-Ma

The macromolecular crystallographer s beam line at the NSLS State-of-the-art CCD detector records diffraction patterns Goniometer centers crystal into beam Spindle rotates crystal during collections Automounter handles cryogenic specimen mounted on crystal caps Crystals are pre-mounted and small 10 100 µm 04/21/10 2/13

MX experiment from data to structure to print Hundreds of diffraction patterns record thousands of Bragg spots Once phased, they yield electron density map of Macromolecule, and then a Science paper 04/21/10 3/13

Plans for MX beam lines at NSLS-II Day 1: Micro diffraction undulator line Highly automated undulator line Complementary methods 3 pole wiggler beam line Soon after: High energy resolution undulator line Complementary methods undulator line Screening and training 3PW line Fully built MX capabilities: 4 ID beam lines 2 3PW beam lines 04/21/10 4/13

New detectors enable new data collection methods NSLS NSLS-II Detector technology CCD Composite Pixel Array Detector model ADSC 315r Pilatus 6M Effective dynamic range 15 bits 20 bits Dead time between frames 0.25 s + overhead = 1 s 0.0036 s Electronic background 0.015 e - /pixel/s 0 Fast readout and zero e-background make it possible to collect data without operating shutter in a continuous sweep of the crystal spindle 04/21/10 5/13

The high flux at NSLS-II and fast detectors will tax cheep networking NSLS NSLS-II Beam intensity on crystal 10 11 ph/s/μm 2 >10 12 ph/s/μm 2 Detector model ADSC 315r Pilatus 6M Frames per data collection 360 x 1 (wide slicing) 1800 x 0.2 (fine slicing) Exposure time per frame 1 s 0.083 s @12Hz Time per collection 720 s = 12 min 150 s = 2.5 min Detector format in pixels 3088 x 3088 (I*2) 2463 x 2527 (I*4) Size per frame 19 MB 25 MB (uncompressed) Size of 360 data set 6.9 GB 45 GB (uncompressed) Stored size of 360 data set 6.9 GB 11 GB (compressed 4X) Data transfer rate 0.076 Gb/s 1.3 Gb/s This is beyond the bandwidth of Gigabit networking! 04/21/10 6/13

Data rate assessment with more appropriate 2015 detector NSLS NSLS-II Beam intensity on crystal 10 11 ph/s/μm 2 >10 12 ph/s/μm 2 Detector model ADSC 315r Eiger class detector Frames per data collection 360 x 1 (wide slicing) 1800 x 0.2 (fine slicing) Exposure time per frame 1 s 0.020 s @ 50 Hz (max 200) Time per collection 720 s = 12 min 36 s ~ 0.5 min Detector format in pixels 3088 x 3088 (I*2) 5 M pixel (I*4) Size per frame 19 MB 20 MB (uncompressed) Size of 360 data set 6.9 GB 36 GB (uncompressed) Stored size of 360 data set 6.9 GB 9 GB (compressed 4X) Data transfer rate 0.076 Gb/s 2 Gb/s This is within bandwidth of 10 Gigabit networking! 04/21/10 7/13

Additional factors demanding high network bandwidth High performance MX beam lines will employ next generation pixel-array detectors to fully exploit available flux and optimized data collection methods. The estimated network rate bandwidth will be needed. We propose to operate the highly automated MX beam line in a multi-user time-shared mode. Every data collection will be analyzed asynchronously. This will demand sustained high data transfer capacity. We propose to lay out data processing nodes in duplicate in support of the multi-user time-shared concept. This will further tax networking bandwidth. Solutions: 10 Gigabit networking is required Reduce data transfers by processing in detector CPU: TB memory 04/21/10 8/13

Networking and remote access to MX beam lines Data Archive NSLS-II Switch Offsite Remote User MX Switch (if needed) Given the speed of data collections xtal screening will mostly vanish data evaluation will be asynchronous users will work remotely Disk Farm Beamline Switch Beamline Pilatus Onsite Remote User Disk Farm Beamline Switch Beamline Pilatus Onsite Remote User Onsite remote users work from LOBs and office using NX/Q client served by NX server, as John Skinner demonstrated on Monday Offsite remote users will control experiment asynchronously in quasi real-time using simple web-based interfaces 04/21/10 9/12

A bit about software and computing hardware Continue to use crystallographic software developed by others thus leveraging proven performance and community acceptance. Build these modules into local architecture. For data reduction consider detector controller-associated CPU equipped it with vast memory (TB). Several reduction packages are already parallelized (i.e. xds, d*trek) and tested. This will help in getting fast feedback to rank frames, assess radiation damage, and make strategy decisions. Use processors with GPUs, rather than clusters, for data scaling, integration, and solving etc. Works for Global Phasing. Serve only occasional aggregates of data frames (i.e. 10X) for users inspection. Will need truly fast new software that is competent and reliable in crystallographic decision making. 04/21/10 10/13

Archiving is a challenge Fast paced data collections demands reliable archiving On the 24-hour scale raw data should be available at the beam line as if in real time Volume 9 GB/min x 1440 min/day = 13 TB/day/beamline Assume 2 lines at full throttle, 2 at half, and 2 at ¼ = On the one week scale data should be available on short recall for reprocessing. Volume: Per beam line this amounts to 4 units of 30TB 48 disk modules. Data should be held for one year to match the experiment to PDB and publication cycle, and to allow critical systems analysis. Volume: 17,464 TB/year 45 TB/day 318 TB/week 17 PB/year Long-term archiving is an institutional size task. Given the modest size of a dataset of 9 GB, users may still carry away raw data on solid media. 04/21/10 11/13

MX beam lines will need a dedicated specialized software team NSLS experience shows that an MX group needs a cross-trained team that together can fill any of the following jobs: 1. A software architect directing integration of software developed by crystallographers and others in the field 2. A specialist to build, tune, and advance network and systems performance and attend to the development of first-class remote capabilities 3. A systems analyst to maintain the 100+ nodes of a group of MX beam lines 4. A specialist concentrating on database programming 5. A specialist for device level EPICS programming and device integration 6. A creative member to drive developments such as advanced GUIs, applications of machine vision, or initiatives in robotics 04/21/10 12/13

Common interests of MX villagers and NSLS-II facility staff Remote Remote participation and asynchronous data collection will be the rule from day one and requires that offsite and onsite remote users have unimpeded simple access to beam line computers and station monitors. Staff A dedicated MX software and computing staff is required to drive continuous upgrades and maintain uninterrupted service to the MX community. This staff and their colleagues at NSLS-II will benefit from close and organized interactions focused on training, collaborations, and career development. Networking To provide optimal data exchange and experimenter mobility between MX beam lines these should be on one facility-independent high-bandwidth network of the 10 Gigabit class. Archiving Long-term archiving of raw diffraction data should be provided through an institutional mechanism. 04/21/10 13/13