Large Data Visualization using Shared Distributed Resources



Similar documents
Distributed Data Storage Based on Web Access and IBP Infrastructure. Faculty of Informatics Masaryk University Brno, The Czech Republic

Principles and characteristics of distributed systems and environments

MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Data Mining with Hadoop at TACC

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Parallel Visualization for GIS Applications

Parallel Large-Scale Visualization

Grid Scheduling Dictionary of Terms and Keywords

Virtualization with Windows

Cluster, Grid, Cloud Concepts

Audio networking. François Déchelle Patrice Tisserand Simon Schampijer

Efficiency Considerations of PERL and Python in Distributed Processing

Axceleon s CloudFuzion Turbocharges 3D Rendering On Amazon s EC2

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Final Report. Cluster Scheduling. Submitted by: Priti Lohani

Violin: A Framework for Extensible Block-level Storage

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Efficient Load Balancing using VM Migration by QEMU-KVM

Is Virtualization Killing SSI Research?

In: Proceedings of RECPAD th Portuguese Conference on Pattern Recognition June 27th- 28th, 2002 Aveiro, Portugal

Chapter 4 IT Infrastructure and Platforms

How To Install Linux Titan

Code and Process Migration! Motivation!

Hadoop Architecture. Part 1

Xeon+FPGA Platform for the Data Center

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

EFFICIENCY CONSIDERATIONS BETWEEN COMMON WEB APPLICATIONS USING THE SOAP PROTOCOL

LDIF - Linked Data Integration Framework

GraySort on Apache Spark by Databricks

Part I Courses Syllabus

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France

Scalable Internet Services and Load Balancing

CHAPTER 3 REAL TIME SCHEDULER SIMULATOR

Client Overview. Engagement Situation. Key Requirements

Parallel Analysis and Visualization on Cray Compute Node Linux

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Module 15: Network Structures

Clusters: Mainstream Technology for CAE

Chapter 16: Distributed Operating Systems

Stream Processing on GPUs Using Distributed Multimedia Middleware

LOAD BALANCING TECHNIQUES

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Building a Continuous Integration Pipeline with Docker

Tableau Server 7.0 scalability

Figure 1. The cloud scales: Amazon EC2 growth [2].

Intro to Virtualization

Virtualization Techniques for Cross Platform Automated Software Builds, Tests and Deployment

Data analysis and visualization topics

How To Understand The Concept Of A Distributed System

LSKA 2010 Survey Report Job Scheduler

Grid e-services for Multi-Layer SOM Neural Network Simulation

Hadoop. Sunday, November 25, 12

Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database Option

Visualization of Adaptive Mesh Refinement Data with VisIt

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

Is Virtualization Killing SSI Research?

Load Balancing of Web Server System Using Service Queue Length

Chapter 14: Distributed Operating Systems

Chapter 11 I/O Management and Disk Scheduling

IOS110. Virtualization 5/27/2014 1

Luncheon Webinar Series May 13, 2013

Parallels Virtuozzo Containers vs. VMware Virtual Infrastructure:

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Networking Operating Systems (CO32010)

Grid Computing Vs. Cloud Computing

Packet-based Network Traffic Monitoring and Analysis with GPUs

Quick Start - NetApp File Archiver

Intel DPDK Boosts Server Appliance Performance White Paper

CHAPTER FIVE RESULT ANALYSIS

DevOps Course Content

Intellicus Enterprise Reporting and BI Platform

Big data blue print for cloud architecture

Chapter 11 I/O Management and Disk Scheduling

Performance Monitoring of Parallel Scientific Applications

SAP IT Infrastructure Management

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

This presentation provides an overview of the architecture of the IBM Workload Deployer product.

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

In Memory Accelerator for MongoDB

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Big Data Governance Certification Self-Study Kit Bundle

Comparative Analysis of Free IT Monitoring Platforms. Review of SolarWinds, CA Technologies, and Nagios IT monitoring platforms

bigdata Managing Scale in Ontological Systems

Transcription:

Large Data Visualization using Shared Distributed Resources Huadong Liu, Micah Beck, Jian Huang, Terry Moore Department of Computer Science University of Tennessee Knoxville, TN

Background To use large-scale shared resources for cutting edge computation jobs is a great idea Communication: the Network Storage and Computation: The Grid? To implement this vision for production use, several high-level services are needed. For example: Resource discovery and management (reservation) Data transfer Scheduling (dynamic monitoring, adaptive control) Our model is the Network, not the Comp/Data Center Logistrical Networking infrastructure and architecture Internet Backplane Protocol, exnode, LoRS NGS Funding: PIs Beck, Dongarra, Huang, Plank

Distributed Visualization Our work focuses on large data visualization: Useful when available on-demand Useful when can be shared in an executable form Use as many processors as available (beyond clusters?) Available in a widespread manner Data intensive Non-standard definition of Distributed Viz We aim to support geographically distributed users The infrastructure does not need to be centralized Our comp/storage nodes are independent network nodes

Distributed Visualization

Data Replication: IBP & exnodes IBP Depot 1 D 2 D 3 D 4 Applications LoRS:runtime system L-Bone exnode IBP File 1 exnode F2 F3 Physical Layer

Visualization Operations We constructed a set of basic visualization operations as a highly portable library: the Visualization Cookbook Library (vcblib) includes major visualization algorithms like software volume rendering, iso-surfacing and flow visualization builds and runs on Unix, Linux, Windows and Mac OS. vcblib provides a reliable and portable building block to deploy visualization operations to the wide area.

Executing vcblib ops on NFU NFU (Network Functional Unit) is a generic, best effort computation service Maximum memory size Limited duration of execution Weak semantics Strong services must be constructed on top (I.e. the scheduler of the parallel visualization algorithm)

Network Functional Unit code module RD Input Allocation PE RD/WR Memory mapping Output Allocation RD Input Allocation IBP MEM NFU is novel due to: 1. weakened semantic and 2. control of security-sensitive operations. 3. Supports both local and mobile code modules

d6 d10 d11 d12 d1 d3 d4 d9 Server:P2 d2 Server:P1 d5 d7 d8 Scheduling Huadong Liu, University of Tennessee Workstation A d1 d2 d4 d12 Task Queues Server:P3 Server:P4 Server:P5 Task Queues Workstation B d5 d6 d8 d9 Server:P6 d3 d7 d10 d11 Depots: {P1,P2,,Pm} Pi described by bw bi & computational power ci Partitioned dataset {d1,d2,, dn}, k-way replication Vis only needs one copy of each dj (Optional) DM tasks Mij replicates dj on Pi Key Challenge: Resource performance varies over time!!!

Scheduling Depots are ranked by number of volume partitions processed so far High vs. Low priority queues (HPQ vs. LPQ) of tasks HPQ: tasks-to-be-assigned, keyed by shortest potential processing time LPQ: tasks-already-assigned, keyed by longest potential wait time

Dynamic Data Movement Some data partitions are just unlucky to be on slow or heavily loaded servers After fast depots are done with local tasks, can dynamically steal some slow partitions Fast server selection <Pi > Partial multi-stream download /w deadline Main multi-stream data transfer <Tj, Pi > Slow task selection <Tj> Succeed? no yes Information update Deadline calculation Termination Termination

Results: the depots Most of our test depots are Planet-Lab nodes The machines workload varies much from one to one The workload is also highly time varying 100 50 One-minute Load 80 60 40 20 0 0 50 100 150 200 250 300 350 400 Server ID One-minute Load 40 30 20 10 0 0 225 450 675 900 1125 1350 Time (minutes)

Results: the data Test data: 30 timestep of Tera-scale Supernova Initiative, 75GB in total Provided by Tony Mezzacappa (ORNL) and John Blondin (ORNL) under the auspices of DOE SciDAC TSI project

Results: the performance 800x800 image resolution, 0.5 step size in ray-casting, per-fragment classification and Phong shading With 100 depots, the average rendering time: 237 sec

To the User You program your visualization by editing an XML file ASCII file, 3KB in size A template is provided

Let s go to the video