Project Convergence: Integrating Data Grids and Compute Grids. Eugene Steinberg, CTO Grid Dynamics May, 2008



Similar documents
In Memory Accelerator for MongoDB

XAP 10 Global HTTP Session Sharing

Scaling Spring Applications in 4 Easy Steps

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Database Services for CERN

Exploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence

Cluster, Grid, Cloud Concepts

<Insert Picture Here> Oracle In-Memory Database Cache Overview

GigaSpaces XAP 10.0 Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP DISTRIBUTED SYSTEMS

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Cognos8 Deployment Best Practices for Performance/Scalability. Barnaby Cole Practice Lead, Technical Services

GigaSpaces XAP 9.7 Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP DISTRIBUTED SYSTEMS

GigaSpaces XAP.NET Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP.NET DISTRIBUTED SYSTEMS

Introduction to the NI Real-Time Hypervisor

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

McAfee Agent Handler

Performance Management for Cloudbased STC 2012

Enabling High performance Big Data platform with RDMA

GigaSpaces Real-Time Analytics for Big Data

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

<Insert Picture Here> WebLogic High Availability Infrastructure WebLogic Server 11gR1 Labs

Amazon EC2 Product Details Page 1 of 5

Learn Oracle WebLogic Server 12c Administration For Middleware Administrators

Using Apache Derby in the real world

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Tier Architectures. Kathleen Durant CS 3200

<Insert Picture Here> Getting Coherence: Introduction to Data Grids South Florida User Group

Introduction to Big Data Training

In-Memory BigData. Summer 2012, Technology Overview

Cloud Based Application Architectures using Smart Computing

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Online Transaction Processing in SQL Server 2008

Provisioning and Resource Management at Large Scale (Kadeploy and OAR)

GridGain In- Memory Data Fabric: UlCmate Speed and Scale for TransacCons and AnalyCcs

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

LOAD BALANCING TECHNIQUES FOR RELEASE 11i AND RELEASE 12 E-BUSINESS ENVIRONMENTS

Manjrasoft Market Oriented Cloud Computing Platform

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Apache Hadoop. Alexandru Costan

Server Virtualization with Windows Server Hyper-V and System Center

Configuring and Managing a Private Cloud with Enterprise Manager 12c

I/O Considerations in Big Data Analytics

Xythos WebFile Server Architecture A Technical Guide to the Core Technology, Components, and Design of the Xythos WebFile Server Platform

Architecting ColdFusion For Scalability And High Availability. Ryan Stewart Platform Evangelist

Eloquence Training What s new in Eloquence B.08.00

Scalable Architecture on Amazon AWS Cloud

Scaling out a SharePoint Farm and Configuring Network Load Balancing on the Web Servers. Steve Smith Combined Knowledge MVP SharePoint Server

Software-defined Storage Architecture for Analytics Computing

Towards Smart and Intelligent SDN Controller

Cloud Computing Architecture: A Survey

Tushar Joshi Turtle Networks Ltd

Performance Prediction, Sizing and Capacity Planning for Distributed E-Commerce Applications

Using an In-Memory Data Grid for Near Real-Time Data Analysis

TRANSFORMING DATA PROTECTION

Whitepaper. Vertex VDI. Tangent, Inc.

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Protect SAP HANA Based on SUSE Linux Enterprise Server with SEP sesam

EMC XTREMIO EXECUTIVE OVERVIEW

Table Of Contents. 1. GridGain In-Memory Database

Cloud Based Distributed Databases: The Future Ahead

HP OO 10.X - SiteScope Monitoring Templates

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Real-Time Analytics for Big Market Data with XAP In-Memory Computing

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data

Developing Scalable Java Applications with Cacheonix

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Capacity Planning for Microsoft SharePoint Technologies

Scaling Analysis Services in the Cloud

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

Technical Report. Implementation and Performance Testing of Business Rules Evaluation Systems in a Computing Grid. Brian Fletcher x

MOSIX: High performance Linux farm

Exadata and Database Machine Administration Seminar

Magic xpi 4.0 Release Notes

High Performance Computing OpenStack Options. September 22, 2015

LSKA 2010 Survey Report Job Scheduler

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

Configuration Management of Massively Scalable Systems

ONOS Open Network Operating System

VMware vcloud Automation Center 6.1

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

SQL Server 2008 Performance and Scale

Oracle Exam 1z0-599 Oracle WebLogic Server 12c Essentials Version: 6.4 [ Total Questions: 91 ]

MySQL performance in a cloud. Mark Callaghan

Week Overview. Installing Linux Linux on your Desktop Virtualization Basic Linux system administration

ISSN: Keywords: HDFS, Replication, Map-Reduce I Introduction:

Transcription:

Project Convergence: Integrating Data Grids and Compute Grids Eugene Steinberg, CTO May, 2008

Data-Driven Scalability Challenges in HPC Data is far away Latency of remote connection Latency of data movement through pipes Chatty algorithms are expensive Data is centralized HW Resources are limited Inevitable disk I/O due to limited RAM Connections are limited Highly concurrent access doesn't scale well 1

Usual Solution: Compute Grid + Data Grid Classic Data Grid Data is partitioned Partitions are stored in memory of data grid Data grid is deployed near to compute grid Search is parallelized over partitions Build-in replication, persistence, coherence, failover What is Achieved? Reduced latency and data moving cost Improved connection scalability Reduced data contention No Disk I/O 100% memory speed Is this the best we can do? 2

Limitations of Compute Grid + Data Grid Two separate grid environments Hardware, footprint and management costs of dual infrastructure Segregated infrastructures cannot share resources Compute Grid Sub-optimal resource utilization Compute grid is CPU-bound, not RAM-bound Data grid is RAM-bound, not CPU-bound Still sub-optimal performance Still paying for remote network calls and data movement Data Grid 3

Better Answer: Compute-Data Grid Shared hardware between compute & data grid Data grid resides in RAM of host machines Compute grid runs HPC jobs on the same host machines Opportunity to collocate processing with data Many applications support compute-data affinity No network overhead on remote calls and data movement New recipe for scalability As HPC application needs to scale in and out, data partitions are spread over larger or smaller pool of hosts 4

Project Convergence Open source reference architecture for Compute-Data Grid Goals Pluggable architecture to support adapters for many grid products Non-intrusive compute-data grid coordination Library of adapters for popular commercial and open source grids Key Use Cases Data-aware job scheduling Dynamic data grid right-scaling 5

Monitor Client Wrapper Scheduler Logical Architecture Compute Grid HOST HOST HOST Data Grid Convergence 6

Implementation Core Components Data Grid Monitor: service responsible for knowing Data Grid s topology and state Data Aware Wrapper: client side library which extends Compute Grid s scheduling API to support data-aware job scheduling Main Workflow Client code submits the job using Data Aware Wrapper Data Aware Wrapper consults Data Grid Monitor Data Grid Monitor returns a set of hosts that are nearest to the data Wrapper submits the job to the Scheduler, requesting specific hosts Variation on Configuration Monitor can be a network service or embedded as a library 7

Demo Hello, World Trading Analytics Setup 4 DataSynapse Engines, 2 per host 2 GigaSpaces partitions Scheduler: DataSynapse GridServer Client app + embedded Monitor + Wrapper Test data Stores100,000 trades for 10 stock tickers Partitioned by ticker Engine P1 GridServer Scheduler Engine P2 Job Computes simple statistics about trades A Job spawns 10 tasks, one task per ticker Task Scheduling Control Functions Data-aware, random, or anti-data-aware Wrapper Client Monitor 8

Demo Screenshot 9

Current Project State Hosted by OpenSpaces.org Licensed under Apache 2.0 Latest version 0.1.1 (Apr 2008 release) Use case supported: data-aware job scheduling Available plug-ins: Compute Grids: Data Synapse GridServer 5.0 Data Grid: GigaSpaces XAP 10

Project Roadmap Support Additional Adapters Convergence 0.2: GridGain (under development) Convergence 0.3: Oracle Coherence Convergence 0.4: Sun Grid Engine Support Additional Use Cases Dynamic data grid right-sizing Call for Action Please, join the project to help test and extend the system, or provide additional adapters http://www.openspaces.org/display/cvg/convergence 11

Q & A 12

Thank You! Eugene Steinberg, CTO esteinberg@griddynamics.com