A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures

Similar documents

Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Benchmarking Cassandra on Violin

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

ASPERA HIGH-SPEED TRANSFER SOFTWARE. Moving the world s data at maximum speed

SQream Technologies Ltd - Conﬁden7al

Big Data and Analytics: Challenges and Opportunities

Transforming cloud infrastructure to support Big Data Ying Xu Aspera, Inc

CloudCmp:Comparing Cloud Providers. Raja Abhinay Moparthi

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

COMP9321 Web Application Engineering

Improved metrics collection and correlation for the CERN cloud storage test framework

LARGE-SCALE DATA STORAGE APPLICATIONS

Benchmarking Amazon s EC2 Cloud Platform

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

UBUNTU DISK IO BENCHMARK TEST RESULTS

Performance And Scalability In Oracle9i And SQL Server 2000

Real Time Big Data Processing

Parallel Replication for MySQL in 5 Minutes or Less

XpoLog Center Suite Log Management & Analysis platform

Big Data Analytics Nokia

Yahoo! Cloud Serving Benchmark

PostgreSQL Performance Characteristics on Joyent and Amazon EC2

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Hadoop & its Usage at Facebook

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator

SEAIP 2009 Presentation

Load Testing Analysis Services Gerhard Brückl

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014

Scalable Forensics with TSK and Hadoop. Jon Stewart

Performance Analysis: Benchmarking Public Clouds

Analyzing Big Data with AWS

Database Economic Cost Optimization for Cloud Computing Adam Conrad Department of Computer Science Brown University Providence, RI

ediscovery Journal Report: Digital Reef & BlueArc ediscovery Software Performance Benchmark Test

Sisense. Product Highlights.

Hadoop & its Usage at Facebook

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Performance Testing of Big Data Applications

IaaS Multi Tier Applications - Problem Statement & Review

Cloud Computing. Adam Barker

Performance and Scalability Overview

Using an In-Memory Data Grid for Near Real-Time Data Analysis

MAGENTO HOSTING Progressive Server Performance Improvements

Hypertable Architecture Overview

Big Data Systems CS 5965/6965 FALL 2014

HP OO 10.X - SiteScope Monitoring Templates

Performance Tuning Guidelines for PowerExchange for Microsoft Dynamics CRM

Achieving Zero Downtime and Accelerating Performance for WordPress

Analytics Canvas Tutorial: Cleaning Website Referral Traffic Data. N m o d a l S o l u t i o n s I n c. A l l R i g h t s R e s e r v e d

Cloud Performance Benchmark Series

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Hadoop & Spark Using Amazon EMR

RevoScaleR Speed and Scalability

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

1 INTRODUCTION 2 APPLICATION PROFILING OVERVIEW

Rackspace Cloud Databases and Container-based Virtualization

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Deep Learning Meets Heterogeneous Computing. Dr. Ren Wu Distinguished Scientist, IDL, Baidu

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Hadoop-based Open Source ediscovery: FreeEed. (Easy as popcorn)

InfiniteGraph: The Distributed Graph Database

Architecture Support for Big Data Analytics

High-Volume Data Warehousing in Centerprise. Product Datasheet

Keywords Cloud Computing, CRC, RC4, RSA, Windows Microsoft Azure

Automating Big Data Benchmarking for Different Architectures with ALOJA

Network Infrastructure Services CS848 Project

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

Azure Data Lake Analytics

Architecting ColdFusion For Scalability And High Availability. Ryan Stewart Platform Evangelist

Drupal Performance Tuning

How To Create A Large Data Storage System

SEACW DELIVERABLE D.1.6

OTM in the Cloud. Ryan Haney

Kentico CMS 6.0 Performance Test Report. Kentico CMS 6.0. Performance Test Report February 2012 ANOTHER SUBTITLE

ANALYTICS IN BIG DATA ERA

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

Axibase Time Series Database

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Data Centers and Cloud Computing. Data Centers

Migration Scenario: Migrating Batch Processes to the AWS Cloud

OpenProdoc. Benchmarking the ECM OpenProdoc v 0.8. Managing more than documents/hour in a SOHO installation. February 2013

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012

Suppor&ng So*ware Evolu&on to the Mul&- cloud with a Cross- Cloud Pla;orm

High Performance Big Data Analy5cs powered by Unique Web Accelera5on and NoSQL. The Big Data Engine

Traffic Localization for DHT-based BitTorrent networks

Big Data With Hadoop

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

Troubleshooting SQL Server Enterprise Geodatabase Performance Issues. Matthew Ziebarth and Ben Lin

Tuning Tableau Server for High Performance

Mobile Storage and Search Engine of Information Oriented to Food Cloud

RTG: A Scalable SNMP Statistics Architecture NANOG 27. Robert Beverly February 10, 2003

Deliverable Billion Triple dataset hosted on the LOD2 Knowledge Store Cluster. LOD2 Creating Knowledge out of Interlinked Data

Transcription:

A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures Afsin Akdogan, Hien To, Seon Ho Kim and Cyrus Shahabi Integrated Media Systems Center University of Southern California, Los Angeles, CA USA 09/05/2014 5 th BPOE Workshop at VLDB

Outline Mo+va+on MediaQ Benchmark Design Experimental EvaluaQon Conclusion and Future DirecQons

Motivation Mobile RevoluQon: Smart phones, tablets and wearable technologies There will be over 10 billion mobile devices by 2018. 54% of them smart devices. Up from 21% in 2013.

Motivation In line with the increase in mobile devices the amount of mobile video rapidly grows: more cloud- based mobile media applicaqons Recent trend in mobile media: not just contents but also geospaqal metadata For be\er archiving and searching GeospaQal metadata Combines files with metadata extracted from the video using sensors available in the devices camera locaqon camera direcqon viewable angle...

Motivation Mobile Video Metadata enables advanced querying features. Future of Mobile Video: It will increase 14- fold by 2018, accounqng for 69% of total mobile data traffic. Cloud- based mobile media applicaqons How well can cloud support such applicaqons?

MoQvaQon MediaQ Benchmark Design Experimental EvaluaQon Conclusion and Future DirecQons

MediaQ An example of resource intensive mobile video applicaqon on cloud [ACM MMSys 2014] FuncQons: collect, search, and share user- generated mobile videos using automaqcally tagged geospaqal metadata (locaqon, direcqon, etc.). Client Side Mobile App Web Services Uploading API GeoCrowd API Server Side Video Processing Transcoding Visual Analytics Keyword Tagging Content repository Data Store Metadata repository Advanced Querying: point, range, direcqonal, etc. Web App User API Search and Video Playing API GeoCrowd Engine Account Management Query processing MySQL Databases MongoDB Architecture

MoQvaQon MediaQ Benchmark Design Experimental EvaluaQon Conclusion and Future DirecQons

Benchmark Design Goal IdenQfy the appropriate server type for mobile video applicaqons Challenge Plenty of server opqons available (compute opqmize, memory opqmized, etc.) Price varies considerably. In Azure, the most expensive server is 245 more costly than the cheapest one. What configuraqon is cost- effecqve? Amazon EC2 MicrosoI Azure Google Compute Server Type price ($/hour) price ($/hour) price ($/hour) smallest largest smallest largest smallest largest General purpose (m) 0.07 0.56 0.02 0.72 0.077 1.232 Compute op+mized (c) 0.105 1.68 2.45 4.9 0.096 0.768 Memory op+mized (r) 0.175 2.8 0.33 1.32 0.18 1.44 Dollar per hour prices of the smallest and the largest servers at each server groups Disk op+mized (i) 0.853 6.82 - - - - Micro (t) 0.02 0.044 - - 0.014 0.0385

Benchmark Design Servers

Benchmark Design QuanQfy the performance of cloud for mobile media applicaqons Methodology: Break the general video upload into sub components Define a cross- resource metric and compare the performance of all components using the single metric Spot the component(s) that becomes the bo\leneck in the workflow and choose the server types accordingly to improve the bo\leneck. E.g., compute opqmized machines are designed for CPU intensive tasks.

Benchmark Design Video uploading with geospaqal metadata Video upload workflow consisqng of three phases: 1. video transmission (network) Upload video file and metadata extracted from the video from mobile clients to cloud servers 2. metadata inserqon to database Insert metadata into relaqonal database tables (e.g., MySQL) 3. video transcoding Reduce resoluqon of the video and change the type if necessary (e.g., from mp4 to avi)

Benchmark Design Metric Throughput: Processed frames per second Throughput in the Workflow 1. video transmission The number of uploaded frames per second 2. metadata inserqon to database The number of frames inserted into the database 3. video transcoding The number of frames transcoded

MoQvaQon MediaQ Benchmark Design Experimental Evalua+on Conclusion and Future DirecQons

Experimental Evaluation IniQal Overall Performance Analysis Used smallest servers in 4 server types on Amazon EC2 Enabled bulk insert (collect 1K records and insert) Used ffmpeg for transcoding, a leading transcoding library Enabled mulq- threading for Database inserqon and Transcoding ObservaQon: Transcoding becomes a major bo\leneck m General purpose c Compute opqmized r Memory opqmized i Disk opqmized 100000 Network Database Transcoding 100000 Network Database Transcoding 10000 10000 throughput 1000 100 throughput 1000 100 log- scale 10 10 1 m-small c-small r-small i-small a) single video 1 m-small c-small r-small i-small b) multiple videos

Experimental Evaluation Transcoding Performance First, mul+- threading MT: Enable mulq- threading on a single video and transcode it asap PST: Run mulqple single- threaded transcoding tasks ObservaQon: ffmpeg does not scale up as the number of threads increases (See Figure a) total time (sec) 60 50 40 30 20 10 0 MP4 32 vcpu AVI 0 4 8 12 16 20 24 28 32 36 Thread Count throughput 7,000 6,000 5,000 4,000 3,000 2,000 1,000 0 MT 32 vcpu PST 0 4 8 12 16 20 24 28 32 36 40 Thread Count a) multi-thread ffmpeg on the same video for different video output types b) multi-thread (MT) vs. parallel single-thread (PST) ffmpegs

Experimental Evaluation Transcoding Second, reducing video quality Input video resoluqon is 960x540 ObservaQon: Throughput increases as the resoluqon decreases. However, the percentage improvement diminishes when the output video resoluqon becomes too smaller because loading the input video, frame by frame, is a constant cost which largely contributes to the total transcoding cost. Output resolu+on MP4 AVI throughput % improvement throughput % improvement Cut the size of each dimension by half at each experiment 480x270 623-626 240x136 842 35% 839 120x68 980 57% 982 60x34 1038 66% 1048-34% 57% 67%

Experimental Evaluation Database Average length of a row is 319 bytes and no index on metadata table We used the smallest and the largest servers in 4 different server types ObservaQon: Metadata inserqon is I/O- intensive but disk- opqmized machines do not expressively outperform others because video upload is mostly append- only. Disk- opqmized instances are tuned to provide fast random I/O; however, in append- only datasets random access is not much used. With largest servers, mulq- threading is enabled and compute- opqmized server handles mulqple threads be\er. Therefore, it outperform others. 70,000 m c r i 60,000 throughput 50,000 40,000 30,000 20,000 m General purpose c Compute opqmized r Memory opqmized 10,000 0 smallest servers largest servers i Disk opqmized

Experimental Evaluation Database inserqon with index - B- tree and hash tree

Experimental Evaluation Performance- Cost: number of frames per dollar

MoQvaQon MediaQ Benchmark Design Experimental EvaluaQon Conclusion and Future Direc+ons

Conclusion and Future Directions Conclusion InvesQgated mobile video upload performance in cloud environment Transcoding (a CPU- intensive tasks) is the major bo\leneck Compute- opqmized servers provides the best performance ffmpeg library does not scale up linearly; therefore, best way to uqlize mulqcore CPUs is to run mulqple single- threaded ffmpeg tasks. Future DirecQons ParQQoning the dataset and scaling out to mulqple servers Extending the metric to measure other system processes such as query processing.

Q & A Seon Ho Kim, Ph.D. seonkim@usc.edu Thank you!