Advances in Virtualization In Support of In-Memory Big Data Applications

Similar documents
Increasing Flash Throughput for Big Data Applications (Data Management Track)

Enabling Technologies for Distributed and Cloud Computing

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Enabling Technologies for Distributed Computing

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Accelerating Real Time Big Data Applications. PRESENTATION TITLE GOES HERE Bob Hansen

Oracle Database In-Memory The Next Big Thing

SQL Server Virtualization

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

The Data Placement Challenge

WebLogic on Oracle Database Appliance: Combining High Availability and Simplicity

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

SLIDE 1 Previous Next Exit

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Inge Os Sales Consulting Manager Oracle Norway

Infrastructure Matters: POWER8 vs. Xeon x86

Maximizing SQL Server Virtualization Performance

Intro to Virtualization

Automating Big Data Benchmarking for Different Architectures with ALOJA

PARALLELS CLOUD STORAGE

SSDs: Practical Ways to Accelerate Virtual Servers

Big Fast Data Hadoop acceleration with Flash. June 2013

SSDs: Practical Ways to Accelerate Virtual Servers

FlashSoft Software from SanDisk : Accelerating Virtual Infrastructures

Microsoft Private Cloud Fast Track

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Nutanix Solutions for Private Cloud. Kees Baggerman Performance and Solution Engineer

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

<Insert Picture Here> Introducing Oracle VM: Oracle s Virtualization Product Strategy

Week Overview. Installing Linux Linux on your Desktop Virtualization Basic Linux system administration

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

PARALLELS SERVER 4 BARE METAL README

A virtual SAN for distributed multi-site environments

owncloud Enterprise Edition on IBM Infrastructure

Windows Server 2012 授 權 說 明

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

Monitoring Databases on VMware

Cloud Optimize Your IT

Realizing the next step in storage/converged architectures

Scaling Database Performance in Azure

Overview: X5 Generation Database Machines

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Mit Soft- & Hardware zum Erfolg. Giuseppe Paletta

ioscale: The Holy Grail for Hyperscale

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Eloquence Training What s new in Eloquence B.08.00

NPA Virtualization. By Ovidiu Bernaschi. Visual Network Systems

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Parallels Cloud Server 6.0 Readme

Establishing Applicability of SSDs to LHC Tier-2 Hardware Configuration

Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Parallels Cloud Server 6.0


BHyVe. BSD Hypervisor. Neel Natu Peter Grehan

Cloud Computing through Virtualization and HPC technologies

Distributed Architecture of Oracle Database In-memory

Integrated Grid Solutions. and Greenplum

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Datacenter Operating Systems

AMD SEAMICRO OPENSTACK BLUEPRINTS CLOUD- IN- A- BOX OCTOBER 2013

Principles and characteristics of distributed systems and environments

QLIKVIEW SERVER MEMORY MANAGEMENT AND CPU UTILIZATION

BigMemory & Hybris : Working together to improve the e-commerce customer experience

2009 Oracle Corporation 1

EMC VPLEX FAMILY. Continuous Availability and Data Mobility Within and Across Data Centers

What s new in Hyper-V 2012 R2

Windows Server 2008 R2 Hyper V. Public FAQ

MAGENTO HOSTING Progressive Server Performance Improvements

Rackspace Cloud Databases and Container-based Virtualization

Understanding Microsoft Storage Spaces

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

Accelerating Database Applications on Linux Servers

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

MakeMyTrip CUSTOMER SUCCESS STORY

SQL Server Consolidation Using Cisco Unified Computing System and Microsoft Hyper-V

Study of Load Balancing of Resource Namespace Service

The New Economics of SAP Business Suite powered by SAP HANA SAP AG. All rights reserved. 2

Comparing Free Virtualization Products

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Essentials Guide CONSIDERATIONS FOR SELECTING ALL-FLASH STORAGE ARRAYS

Enterprise Architectures for Large Tiled Basemap Projects. Tommy Fauvell

Maximum performance, minimal risk for data warehousing

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Lab Validation Report

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

SMB Direct for SQL Server and Private Cloud

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Binary search tree with SIMD bandwidth optimization using SSE

Transcription:

9/29/15 HPTS 2015 1 Advances in Virtualization In Support of In-Memory Big Data Applications SCALE SIMPLIFY OPTIMIZE EVOLVE Ike Nassi Ike.nassi@tidalscale.com

9/29/15 HPTS 2015 2 What is the Problem We Solve? Fast access to big data Faster access if data is in memory But often there s too much data can t always fit it in memory Can t always buy a bigger computer (with more memory) So, the only alternative has been to scale out Generally requires new software or a software re-write ($$) Scale-out is not always easy Solution is to create a single larger virtual machine Allows users to scale-up without $$ supercomputer HW costs No software changes required And, we ll show you how well this works!

9/29/15 HPTS 2015 3 Status Up and running for 11 months Compatible with all software tested Automated 24x7 test system 3 customer trials completed in the last 6 weeks (Analytics, Big Data, EDA) Supporting Centos 6.5, 6.6, 6.7, RHEL, FreeBSD Passing full suite of Linux Test Project tests (for servers) Preparing RC-2 25 node cluster for dev, 3(+1) x 5 node clusters for customers & test Not a research project any longer

9/29/15 HPTS 2015 4 Traditional View of Virtualization Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Virtual Machine Physical Machine Virtual Machines sharing a server

9/29/15 HPTS 2015 5 Alternative: Provide Hardware Aggregation Guest OS Virtual Machine Physical Machine Physical Machine Physical Machine Physical Machine Physical Machine Physical Machine A Single Large Virtual Machine running on multiple servers (Inverse Virtualization)

9/29/15 HPTS 2015 6 Scale out vs. Scale up and out? DB OS HW Today s Model Scale-out Application DB OS HW Query1 Query2 Query3 DB OS HW Hyper Kernel TidalScale Model Application Database Operating System Hyper Kernel Hyper Kernel HW HW HW Completely, 100%, bit-for-bit unmodified

9/29/15 HPTS 2015 7 Scale up and Scale out (The Best of Both) Software Simplicity HW Cost

9/29/15 HPTS 2015 8 Everything Just Works Just a few examples of many

9/29/15 HPTS 2015 9 Price/Performance at Scale (late 2014, list price) No disks Includes license

9/29/15 HPTS 2015 10 Performance Scales Up 200 cores 60 cores Advantages: 1. Highly scalable 2. More memory & PCI bandwidth 3. 3.33 x cores 4. 40% cost

9/29/15 HPTS 2015 11 Recent Hardware Example POC-2 HW 120 processors @ 3.4GHz 3.84 TB 20x500GB SSD < $ 80K (with tax) As currently booted: POC-2 current guest vm 3.2 TB 64 processors @ 3.4GHz 2x500GB SSD (>2 in test)

9/29/15 HPTS 2015 12 Screenshots 3.2 TB DRAM 64 processors

9/29/15 HPTS 2015 13 Working Sets* 45 years ago we figured out how to virtualize memory using the WS locality. Today, locality is applied ubiquitously across our computing infrastructure. TidalScale applies mobility & locality to all primary resource types (processors, memory, Ethernets, storage) automatically & dynamically across physical machines. * P.J. Denning

9/29/15 HPTS 2015 14 Nodes, Memory and Processors The hyperkernels aggregate all the processors, all the memory, all the storage, all the Ethernets (except for the private interconnect that we treat more as a system bus than a network). We treat these as virtual and mobile. The hyperkernels bind the virtual resources to physical resources, on demand. We move pages, vcpu s, interrupts, clocks, etc. There is no master node and no shared state among instances of the hyperkernel. Scheduling is purely distributed and peer-to-peer. The hyperkernel uses hardware virtualization extensions; the guest OS uses the first level page tables, the hyperkernels control the second level page tables. Memory is strongly coherent. We keep multiple read only copies of pages across the cluster, and do the normal invalidation when writes change page ownership. The hyperkernel sits beneath the OS and above the hardware. The VM looks like hardware to the OS and the hyperkernel does not need to know which OS or applications are running.

9/29/15 HPTS 2015 15 When Does the HK Get Involved? Application Hyper Kernel Hyper Kernel Database Operating System Hyper Kernel Hyper Kernel HW HW HW HW Hyper Kernel HW Exit

9/29/15 HPTS 2015 16 Exits = Stalls No exits = no overhead [in which case we run at machine speed] The hyperkernel binds virtual processors to physical processors Each exit is caused by an event that cannot be handled by the guest e.g. remote interrupts, access to a remote I/O device, remote memory access, etc. It s trapped and analyzed; strategies are evaluated For each exit type, there is a unique set of strategies with costs: cost s1 = w 1 f 1 i1 + w 2 f 2 i2 + + w n f n in cost s2 = w 1 f 1 i1 + w 2 f 2 i2 + + w n f n in costselected = min(costs*) [migrate a vcpu, or move/copy page] The hyperkernel keeps track of the updated state as we go along

9/29/15 HPTS 2015 17 Performance (late 2014) TidalScale delivered 47x to 62x speedup over HDD 1,000,000 494,560 373,413 382,117 Rows per second 100,000 10,000 7,928 7,925 7,941 1,000 Bare metal 128G TidalScale 192G Simple equality ConjuncBon of simple predicates ConjuncBon of simple predicates and one disjuncbon 3 query types across 100 Million rows Test performed with 4 cores at 2200 MHz TidalScale system was 2x96G

9/29/15 HPTS 2015 18 Performance (always a work in progress) Our goal is to test in-memory performance on the machine described earlier: 5 nodes, 3.2TB, 64 processors, 2x 500GB SSD TPC-H data by itself does not sufficiently exercise TS So, we load up mysqld simultaneously with both TPC-H data and customer data Our test varies memory configuration and compares performance with and without vcpu migration and/or page spillover

9/29/15 HPTS 2015 19 Experiment: Memory setup Experiments mysqld memory config 1 2 3 Memory SSD 400GB, no migration 1 TB, with migration innodb cache 1 TB 0 400 GB 400 GB mysqld temp memory 1 TB/table 1.5 TB heap 0 0 600 GB

9/29/15 HPTS 2015 20 The experiment i=tpch echo `time sh -c "echo 'select count(c_custkey) from customer' mysql -u tpch -ptpch $i"` & echo `time sh -c "echo 'select count(p_partkey) from part' mysql -u tpch -ptpch $i"` & echo `time sh -c "echo 'select count(n_nationkey) from nation' mysql -u tpch -ptpch $i"` & echo `time sh -c "echo 'select count(r_regionkey) from region' mysql -u tpch -ptpch $i"` & echo `time sh -c "echo 'select count(s_suppkey) from supplier' mysql -u tpch -ptpch $i"` & echo `time sh -c "echo 'select count(ps_partkey) from partsupp' mysql -u tpch -ptpch $i"` & echo `time sh -c "echo 'select count(o_orderkey) from orders' mysql -u tpch -ptpch $i"` & echo `time sh -c "echo 'select count(l_orderkey) from lineitem' mysql -u tpch -ptpch $i"` & wait

9/29/15 HPTS 2015 21 Customer + TPC-H (speedup) 12 10 8 6 4 2 0 run from SSD (~80GB) run from 400GB allocated and instantiated memory no vcpu migration 1 TB allocated and instantiated with vcpu migration

9/29/15 HPTS 2015 22 Analysis of the experiment 1. Using larger memories gives us 3x 9X performance speedup over already fast SSDs 2. Use your favorite factor to extrapolate to HDDs 3. While we processed 400GB of data into cache, we, in fact, allocated an additional 600GB (1TB -.4TB), so we had a lot of memory cache headroom on the existing hardware for processing larger data sets 4. The difference between experiment 2 and 3 is that you can now have 600GB of temporary tables in memory that can be used for subsequent work without having to write the temporary results to disk. 5. Migration did not appreciably slow down the elapsed time between experiment 2 and experiment 3.

9/29/15 HPTS 2015 23 Difficulty: How to evaluate performance Try before you buy: currently we try to test real apps on real data on systems we host If it doesn t crash, and it works with all tested software, then what? Scalability based on data size? Linearity? Who knows? It s algorithm dependent Fix the data size, scale the diameter? Better Best might be to have constant data size, select deterministic tests, vary cluster diameter. I m open to other suggestions!

9/29/15 HPTS 2015 24 Lessons Learned - I Keeping more data in memory reduces paging overhead Duh but not so easy Gentle waves are better than tidalwaves Traditional HW has one memory wall - we have multiple walls Hyperkernel minimizes or eliminates shared state Each node looks out for itself increased recovery for mental breakdowns Adhere to the prime directive: Look like hardware Never change the guest OS Never change the app Intuition about the necessary speed of the interconnect is generally wrong. We don t saturate the interconnect.

9/29/15 HPTS 2015 25 Lessons Learned - II When things go wrong, system recovers Not always what you want when in development phase! Train the system, rather than tune the system Synchronizing distributed time is hard Because we don t change the guest or the apps, our users don t risk breaking them It s hard to introduce hyperkernel bugs when we re not modifying code Scale the computer to the problem, not the converse Don t necessarily believe people who say It can t be done! or- It s been tried before, and it doesn t work

9/29/15 HPTS 2015 26 Summary Scale: Aggregates compute resources for large scale in-memory analysis and decision support Scales like a cluster using commodity hardware, at linear cost Allows customers to grow gradually as their needs develop Simplify: Dramatically simplifies application development: no changes! No need to distribute data or work across servers Existing applications run as a single instance, without modification, as if on a highly flexible mainframe Optimize: Automatic dynamic hierarchical resource optimization Evolve: Applicable to modern and emerging microprocessors, memories, interconnects, persistent storage & networks

9/29/15 HPTS 2015 27 SCALE SIMPLIFY OPTIMIZE EVOLVE Contact: Ike Nassi ike.nassi@tidalscale.com