Scaling up to Production

Similar documents
Intel Media SDK Library Distribution and Dispatching Process

Intel Platform and Big Data: Making big data work for you.

Vendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp.

The ROI from Optimizing Software Performance with Intel Parallel Studio XE

Keys to node-level performance analysis and threading in HPC applications

Intel Service Assurance Administrator. Product Overview

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

High Performance Computing and Big Data: The coming wave.

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study

Intel Cyber Security Briefing: Trends, Solutions, and Opportunities. Matthew Rosenquist, Cyber Security Strategist, Intel Corp

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Big Data for Big Science. Bernard Doering Business Development, EMEA Big Data Software

The Foundation for Better Business Intelligence

Accelerating Business Intelligence with Large-Scale System Memory

Towards OpenMP Support in LLVM

Large-Data Software Defined Visualization on CPUs

Implementation and Performance of AES-NI in CyaSSL. Embedded SSL

Cloud-based Analytics and Map Reduce

Intel Solid-State Drives Increase Productivity of Product Design and Simulation

Accelerating Business Intelligence with Large-Scale System Memory

Intel Ethernet Switch Converged Enhanced Ethernet (CEE) and Datacenter Bridging (DCB) Using Intel Ethernet Switch Family Switches

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Accomplish Optimal I/O Performance on SAS 9.3 with

Dell One Identity Manager Scalability and Performance

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

The Transition to PCI Express* for Client SSDs

Memory Sizing for Server Virtualization. White Paper Intel Information Technology Computer Manufacturing Server Virtualization

Managing a local Galaxy Instance. Anushka Brownley / Adam Kraut BioTeam Inc.

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

Configuring RAID for Optimal Performance

Overview of Data Fitting Component in Intel Math Kernel Library (Intel MKL) Intel Corporation

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC

ORACLE INFRASTRUCTURE AS A SERVICE PRIVATE CLOUD WITH CAPACITY ON DEMAND

Fast, Low-Overhead Encryption for Apache Hadoop*

Solution Recipe: Improve PC Security and Reliability with Intel Virtualization Technology

Different NFV/SDN Solutions for Telecoms and Enterprise Cloud

Leading Virtualization 2.0

Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms

Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Lustre* and Amazon Web Services

Cloud based Holdfast Electronic Sports Game Platform

Running Oracle s PeopleSoft Human Capital Management on Oracle SuperCluster T5-8 O R A C L E W H I T E P A P E R L A S T U P D A T E D J U N E

Modernizing Servers and Software

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

Finding Performance and Power Issues on Android Systems. By Eric W Moore

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Intel Media Server Studio Professional Edition for Windows* Server

* * * Intel RealSense SDK Architecture

Intelligent Business Operations

新 一 代 軟 體 定 義 的 網 路 架 構 Software Defined Networking (SDN) and Network Function Virtualization (NFV)

Overcoming Security Challenges to Virtualize Internet-facing Applications

Integrated Grid Solutions. and Greenplum

PC Solutions That Mean Business

Implementing Cloud Storage Metrics to Improve IT Efficiency and Capacity Management

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Oracle Provides Cost Effective Oracle8 Scalable Technology on Microsoft* Windows NT* for Small and Medium-sized Businesses

Comparing Multi-Core Processors for Server Virtualization

SAP * Mobile Platform 3.0 Scaling on Intel Xeon Processor E5 v2 Family

Intel RAID SSD Cache Controller RCS25ZB040

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service. Eddie Dong, Tao Hong, Xiaowei Yang

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe May, 2013

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Extended Attributes and Transparent Encryption in Apache Hadoop

Intel Cyber-Security Briefing: Trends, Solutions, and Opportunities

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1.

Microsoft Private Cloud Fast Track

CLOUD SECURITY: Secure Your Infrastructure

A Superior Hardware Platform for Server Virtualization

Benchmarking Cloud Storage through a Standard Approach Wang, Yaguang Intel Corporation

Taking Virtualization

HOW MANY USERS CAN I GET ON A SERVER? This is a typical conversation we have with customers considering NVIDIA GRID vgpu:

APPLICATION MANAGEMENT SUITE FOR ORACLE E-BUSINESS SUITE APPLICATIONS

Integrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps

Tips and Best Practices for Managing a Private Cloud

Getting Started with Database As a Service on OpenStack

The Cross-Media Contact Center

Evaluating Intel Virtualization Technology FlexMigration with Multi-generation Intel Multi-core and Intel Dual-core Xeon Processors.

Life With Big Data and the Internet of Things

Server Consolidation with SQL Server 2008

Bandwidth Calculations for SA-1100 Processor LCD Displays

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

Dell Statistica. Statistica Document Management System (SDMS) Requirements

High Availability Server Clustering Solutions

Oracle Cloud Platform. For Application Development

Adopting Software-Defined Networking in the Enterprise

What Is Microsoft Private Cloud Fast Track?

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

An Oracle White Paper August Oracle VM 3: Server Pool Deployment Planning Considerations for Scalability and Availability

Cloud Computing through Virtualization and HPC technologies

Benefits of Intel Matrix Storage Technology

VBLOCK SOLUTION FOR SAP: SAP APPLICATION AND DATABASE PERFORMANCE IN PHYSICAL AND VIRTUAL ENVIRONMENTS

Transcription:

1 Scaling up to Production

Overview Productionize then Scale Building Production Systems Scaling Production Systems Use Case: Scaling a Production Galaxy Instance Infrastructure Advice 2

PRODUCTIONIZE THEN SCALE 3

Productionize then Scale Productionize Put into operation for end users Scale-up Ability to manage increasing workloads Characteristics of production process impacts how you scale-up 4

Whole Genome Sequencing WGS for humans 60x coverage ~200GB of read data ~80MB of variant data 3 hours of compute (e.g. Intel s highly optimized algorithms) Source: Gullapalli, et. al., Next generation sequencing in clinical medicine: Challenges and lessons for pathology and biomedical informatics, Journal of Pathology Informatics, Year 2012, Volume 3, Issue 1 [p. 40] 5

WGS in the Clinic Patients Physicians Clinics Insurance Test Lab Regulators Developers EMR Source: Gullapalli, et. al., Next generation sequencing in clinical medicine: Challenges and lessons for pathology and biomedical informatics, Journal of Pathology Informatics, Year 2012, Volume 3, Issue 1 [p. 40] 6

WGS at Scale Every newborn: 140 million/yr 28000 PB of read data 11.2 PB variant data 4.2 x 10 7 hours of compute PER YEAR Other Applications Cancer T/N, time series, multi-tissue Familial studies Source: Gullapalli, et. al., Next generation sequencing in clinical medicine: Challenges and lessons for pathology and biomedical informatics, Journal of Pathology Informatics, Year 2012, Volume 3, Issue 1 [p. 40] 7

BUILDING PRODUCTION SYSTEMS 8

Building Production Systems: Overview Understanding Users Managing Data Assessing Applications Additional Considerations Optimizing the Process Managing Change 9

Understanding Users User profiles End-users, support staff, finance, etc. Use cases Number of users Tip: create a quick matrix of user profiles and the number of users within each profile currently and in 3 years 10

Managing Data Data amount Data access Data characteristics Data locality Tip: write down an estimate for the low and high bound of data coming into the system and from where data generated by the system data being delivered from the system and to where 11

Assessing Applications Application types Application requirements Application characteristics Tip: document a few runs to get an idea of the memory use, CPU use, runtime, etc. of an application 12

Regulatory Compliance Revolves around organizational policies to enforce best practices ISO, Good Clinical Practice, Safe Harbor, CLIA, FDA, EMR/EHR, EMA Significant time and financial investment is required to achieve regulatory compliance 13

Optimizing the Process Evaluate users, data, applications and other considerations individually and as a whole What are the bottlenecks? What can be optimized? What can be automated? 14

Optimizing the Process Bottlenecks Optimization Automation CPU bound IO bound Memory speed Memory size Network bandwidth Network latency Instrument capture & data movement Application CPU Memory Input/Output Networking Options Program Translation QC Business rules Testing Notifications Workflows Error handling Reporting 15

Optimizing the Process: WGS Example Network from sequencer to storage Storage space for data Available RAM and compute Accessibility of data to compute resources Secure storage and data transfer Parallelize of mapping, variant calling Automate QC metrics 16

Managing Change We are in an dynamic field Tools will change, metrics will be refined, file formats will evolve, new regulations will be made Key concepts often forgotten Modularization Interoperability Usability 17

SCALING PRODUCTION SYSTEMS 18

Scaling Production Systems When to Scale How to Scale Infrastructure 19

When to scale BE PROACTIVE NOT REACTIVE Forecast increases in number of users number of jobs computational intensity of jobs amount of data Periodic reassessment of forecasts 20

How to Scale Factors to consider when scaling Number of users Number of jobs Types of jobs Amount of data 21

Number of Users Access control User account management Resource allocation Prioritization Individual usage monitoring/tracking 22

Number of Jobs Job submission management Job queues Priorities Status information Ability to appropriately use resources for jobs Load balancing 23

Types of Jobs Memory intensive IO intensive Compute intensive Optimized/custom applications 24

Amount of Data Tiered storage Handles different levels of availability Cost trade-offs Implement data management policy Data Transfer Network bandwidth and latency Data movement accelerators Data Ingestion 25

Infrastructure Technology Fat Nodes & Appliances Rackmounts and blades Scale-out architectures Shared memory architectures Network requirements 26

CASE STUDY: SCALING A PRODUCTION GALAXY INSTANCE 27

Scaling a Production Galaxy Instance Galaxy Overview Considerations for Local Installation Scaling Galaxy 28

Galaxy Overview Galaxy is an open, web-based platform for data intensive biomedical research 29

Considerations for Local Installation Hosting Galaxy Locally Managing the Software Supporting the Users 30

Hosting Galaxy Locally Host server must have sufficient storage because Galaxy need direct access to data Input data, intermediate data, results, metadata, static data resources Host server must have a sufficient network connection Host server needs sufficient compute resources Analysis tools 31

Hosting Galaxy Locally Personal workstation Flexibility that comes with self-management Very limited resources, requires know-how Local shared cluster Better ROI on upfront install effort investment A lot of support and management overhead Appliance Dedicated high performance server Automated software management Leverages other infrastructure to scale 32

Managing the Software Galaxy software versions Analysis tools versions Software dependencies Performance optimization 33

Supporting Users Number of Users Manage job submission Manage user accounts Setting up quotas Support Services 34

Scaling Galaxy Software Hardware Storage Compute Network 35

Software Job scheduler Grid Engine Galaxy Database PostGreSQL, MySQL Proxy server Apache Optimize configurations 36

Hardware: Storage Storage local to host server must be large enough to store all data handled by Galaxy Increase local capacity Network more storage Port instance to machine with more storage Backup Move data off local storage to make room 37

Hardware: Compute Host must have access to sufficient compute Port Galaxy to a more powerful host Build additional computational resources on the Galaxy host Leverage job scheduler to span resources Burst to the cloud or other clusters 38

Hardware: Network Connectivity to data sources Internal network Connection to HTP data generation instrument External network Supporting a global user base Transfer protocols Optimized tools for transferring data 39

INFRASTRUCTURE ADVICE 40

Infrastructure Advice Science is changing faster than we can refresh IT Consider future flexibility as much as current needs Avoid things that lock you into a vendor or platform Continually evaluate your default assumptions 41

Infrastructure Advice Physical and network data ingestion Think about edge cases and the unexpected Don t go crazy with upfront investment Compute and analysis Pay attention to areas that need optimization for your operations 42

Infrastructure Advice Cloud strategy Spend time to develop policies, procedures, assess risk etc. Consider laying the technical groundwork now so it is easier to make use of the cloud when needed 43

Infrastructure Advice Storage and data management Spend bulk of attention and budget here Understand the diversity of products and features to minimize risk and mistakes Define your storage approach to match you organization s funding and staffing model. 44

Final Takeaway If you think big data is here now... Know your key systems, technologies, and bottlenecks Researchers and IT must work together to build environments that enable science 45

Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright, Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 46 Copyright 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.