In-Memory Computing. New Architecture for Big Data Processing. Hai Jin

Similar documents
In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

An Open Source Memory-Centric Distributed Storage System

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Hadoop: Embracing future hardware

Scaling from Datacenter to Client

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

Storage Class Memory and the data center of the future

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Performance Baseline of Oracle Exadata X2-2 HR HC. Part II: Server Performance. Benchware Performance Suite Release 8.4 (Build ) September 2013

Hybrid Software Architectures for Big

How NAND Flash Threatens DRAM

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Performance Management in Big Data Applica6ons. Michael Kopp, Technology

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com

Architectures for Big Data Analytics A database perspective

Big Fast Data Hadoop acceleration with Flash. June 2013

SSD Performance Tips: Avoid The Write Cliff

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

Energy Efficient MapReduce

Big Graph Processing: Some Background

Symmetric Multiprocessing

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

bigdata Managing Scale in Ontological Systems

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG

Getting the Most Out of Flash Storage

Solid State Drive Architecture

Big Data, SAP HANA. SUSE Linux Enterprise Server for SAP Applications. Kim Aaltonen

Infrastructure Matters: POWER8 vs. Xeon x86

Big Data Analytics - Accelerated. stream-horizon.com

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME?

Memory-Centric Database Acceleration

Bringing Big Data Modelling into the Hands of Domain Experts

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe

Mark Bennett. Search and the Virtual Machine

SQream Technologies Ltd - Confiden7al

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

Price/performance Modern Memory Hierarchy

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

System Architecture. In-Memory Database

Big Data Functionality for Oracle 11 / 12 Using High Density Computing and Memory Centric DataBase (MCDB) Frequently Asked Questions

Exadata Database Machine

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Enlarge Bandwidth of Multimedia Server with Network Attached Storage System

Oracle Database In-Memory The Next Big Thing

BSPCloud: A Hybrid Programming Library for Cloud Computing *

Evaluation Methodology of Converged Cloud Environments

2009 Oracle Corporation 1

Sun Constellation System: The Open Petascale Computing Architecture

Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

How To Improve Performance On A Single Chip Computer

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Use of Hadoop File System for Nuclear Physics Analyses in STAR

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Architecture Support for Big Data Analytics

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Embracing Open Source: Practice and Experience from Alibaba. Wensong Zhang The 11 th Northeast Asia OSS Promotion Forum

Big Data Performance Growth on the Rise

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS

SAP Business Suite powered by SAP HANA

Recommended hardware system configurations for ANSYS users

GridGain In- Memory Data Fabric: UlCmate Speed and Scale for TransacCons and AnalyCcs

NAND Flash Architecture and Specification Trends

Experiments on cost/power and failure aware scheduling for clouds and grids

Using RDBMS, NoSQL or Hadoop?

Networking Virtualization Using FPGAs

A Study of Application Performance with Non-Volatile Main Memory

Developing Scalable Java Applications with Cacheonix

Oracle Big Data SQL Technical Update

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

HPC performance applications on Virtual Clusters

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Integrating Apache Spark with an Enterprise Data Warehouse

Moving From Hadoop to Spark

Transcription:

In-Memory Computing New Architecture for Big Data Processing Hai Jin Cluster and Grid Computing Lab Services Computing Technology and System Lab Big Data Technology and System Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan, 430074, China hjin@hust.edu.cn

Big Data: More Than Big

Varieties of Data Internet data Scientific data Multimedia data Enterprise/IoT data

Characteristics of Enterprise/IoT Data Regular structure and expression Rapid growth Timeliness data processing High value density High security requirement

In-Memory Computing

Gartner Top 10 Strategic Technology Trends for 2013

Overview of SAP HANA

Oracle s Big Data Appliance/Exalytics

Energy Consumption in Modern Electronic ICT Systems

DRAM Read Power vs. Data Rate

Downsides of DRAM Refresh Energy consump.on: Each refresh consumes energy Performance degrada.on: DRAM bank unavailable while refreshed QoS/predictability impact: (Long) pause.mes during refresh Refresh rate limits DRAM capacity scaling

Refresh Overhead: Performance 46% 8%

Refresh Overhead: Energy 47% 15%

Emerging Non-volatile Memory (NVM) Technologies 15/11/10

Intel-Micron 3D XPoint Technology 15/11/10

International Technology Roadmap for Semiconductors Projection

Intel-Micron 3D Xpoint Technology 15/11/10

New Advances on Memory and Storage -Storage Class Memory

Storage Class Memory SCM is a new class of data storage and memory devices SCM blurs the dis.nc.on between MEMORY (= fast, expensive, vola/le ) and STORAGE (= slow, cheap, non- vola/le) Characteris.cs of SCM Solid state, no moving parts Short access.mes (~ DRAM like, within an order- of- magnitude) Low cost per bit (DISK like, within an order- of- magnitude) Non- vola.le ( ~ 10 years)

20 Emerging NVM Technological Choice Time Slots by Application

Reconstruction of Virtual Memory Architecture: Break the I/O Bottleneck 21

New In-Memory Computing Architecture Build DRAM + SCM hybrid hierarchical/parallel memory structure Memory Channel I/O Channel Disk Memory Channel Data I/O Channel New In-memory Computing Architecture The new in-memory computing architecture supports the shift from computing-centric to combination of computing and data 22

100x Faster Access to Data using SCM

HP Nanostore Project

The Machine - Memory-Driven Computing

The Machine - Memory-Driven Computing

The Machine Roadmap

Technology and System of In-Memory Computing for Big Data Processing Project Overview Total budget: 170M RMB Period: January 2015 December 2017 Participants Inspur Huawei Sugon Shanghai Jiaotong University Huazhong University of Science and Technology Chongqing University National University of Defense Technology Huadong Normal University Jiangnan Computing Institute 28

Technology and System of In-Memory Computing for Big Data Processing Project Mission Hybrid NVM- based high reliable, massive storage, and low power in- memory compu.ng system, the capacity of NVM in each node should be in TB level, suppor.ng zero bootup System soyware and simula.on plazorm for in- memory compu.ng system Parallel processing system for in- memory compu.ng system In- memory database for hybrid memory architecture to support decision making and other data 29 management applica.ons

System Architecture

Full System Simulator Applications Core Core Core Core Latency-Aware Shared Cache Data Migration Service Channel Redirect Mapping PCM Libs OS Migrator DRAM Software Level Migration Libs Hardware Level Migration Strategies Designed Based On: MARSSX86+ NVMain More flexible (supports NVMain DRAMSim HybridSim as main memory simulator) Simulate memory system precisely, easy to configure

Data Migration Data Placement determines Access Latency, Energy Consumption, Device Lifetime Explicit Migration allocate and migrate virtual page specified by programmer DRAM #include lib/migrate.hh p1 int main() { Migrate_Init(); p1 = malloc(size_1); p2 p2 = malloc(size_2); Migrate(p1,size_1,PCM_MEM); Migrate(p2,size_2,DRAM_MEM); } Virtual address space PCM migration Physical address space

Data Migration Implicit Migration allocate virtual pages automatically and migrate them later Low access latency Low energy consumption Core Core Recording Access Information Page Migration Classifying Pages Energy Model PCM DRAM

Miss Penalty Aware Cache Replacement On a LLC miss Low latency: Row Buffer Hit, Access DRAM High latency: Access PCM Miss Penalty Aware Policy Balancing Data Locality and Miss Latency

Flint: Exploiting Raw Data of In-Memory Data Objects in Distributed Data-Parallel Systems l Completely decomposing in-memory data objects to eliminate the references invocation and reduce frequent garbage collection in JVM l A system to automatic converse the user codes, decompose data objects and manage in-memory raw data l Speedup from 2x to 5.67x Compare with Apache Spark App. LR KMeans PR CC Bagel- PR Bagel- CC WC Spark 15.7s 35s 930s 90s 276.6s 204s 900s Flint 6s 10s 237s 36s 56s 36s 426s Speedup 2.61x 3.5x 3.92x 2.5x 4.94x 5.67x 2.11x

Mammoth: Memory-Centric MapReduce System A novel rule- based heuris.c to priori.ze memory alloca.on and revoca.on mechanism A mul.- threaded execu.on engine, which realize global memory management Compa.ble with Hadoop Sources available at Github and ASF IEEE Computer Spotlight Execution Engine 10 36

Landscape of Disk- Based and In- Memory Data Management Systems (2014) 37

Thanks!