Transforming cloud infrastructure to support Big Data Ying Xu Aspera, Inc



Similar documents
ASPERA HIGH-SPEED TRANSFER SOFTWARE. Moving the world s data at maximum speed

Aspera Overview. Richard Voaden UKI Channel Sales Leader, Aspera. +44 (0)

Taking Big Data to the Cloud. Enabling cloud computing & storage for big data applications with on-demand, high-speed transport WHITE PAPER

High Speed Transfers Using the Aspera Node API! Aspera Live Webinars November 6, 2012!

Four Ways High-Speed Data Transfer Can Transform Oil and Gas WHITE PAPER

Aspera Direct-to-Cloud Storage WHITE PAPER

Challenges of Sending Large Files Over Public Internet

LARGE DATA TRANSPORT FOR THE SCIENCE DMZ

ASPERA HIGH-SPEED TRANSFER SOFTWARE. Moving the world s data at maximum speed

Solutions for High-speed Data Transfer: Moving Data in the Era of Big Data. AIRI Petabyte Challenge Plenary Session 2 April 5, 2012

Frequently Asked Questions

HIGH-SPEED BRIDGE TO CLOUD STORAGE

Aspera Direct-to-Cloud Storage WHITE PAPER

Synchronizing Large Engineering Source Code Repositories Scalable, multidirectional synchronization over distance with Aspera Sync WHITE PAPER

Enabling Cloud Architecture for Globally Distributed Applications

Aspera Software for Isilon Scale-out NAS. The Aspera and Isilon Solution for High-speed File and Content Delivery over the Wide Area WHITE PAPER

Axceleon s CloudFuzion Turbocharges 3D Rendering On Amazon s EC2

A High-Performance Storage and Ultra-High-Speed File Transfer Solution

White Paper. Amazon in an Instant: How Silver Peak Cloud Acceleration Improves Amazon Web Services (AWS)

Testing & Assuring Mobile End User Experience Before Production. Neotys

Cisco WAAS for Isilon IQ

Amazon Cloud Storage Options

HP and Aspera. Enabling collaboration and optimizing content storage and delivery. HP and Aspera making longdistance file sharing feel local

Diagram 1: Islands of storage across a digital broadcast workflow

Cisco Application Networking for Citrix Presentation Server

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Key Components of WAN Optimization Controller Functionality

Aspera FASP TM High Speed Transport A Critical Technology Comparison WHITE PAPER

IBM Spectrum Protect in the Cloud

Aspera Sync Scalable, Multidirectional Synchronization of Big Data - Over Distance WHITE PAPER

Scala Storage Scale-Out Clustered Storage White Paper

Migration Scenario: Migrating Batch Processes to the AWS Cloud

Cisco Unified Computing System and Signiant File Transfer Solutions: Accelerate and Secure Content Exchange Around the World

WAN OPTIMIZATION. Srinivasan Padmanabhan (Padhu) Network Architect Texas Instruments, Inc.

The Problem with TCP. Overcoming TCP s Drawbacks

Increased Security, Greater Agility, Lower Costs for AWS DELPHIX FOR AMAZON WEB SERVICES WHITE PAPER

WanVelocity. WAN Optimization & Acceleration

Best Practices in Legal IT. How to share data and protect critical assets across the WAN

Cisco Wide Area Application Services Optimizes Application Delivery from the Cloud

EMC IRODS RESOURCE DRIVERS

CISCO WIDE AREA APPLICATION SERVICES (WAAS) OPTIMIZATIONS FOR EMC AVAMAR

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Accelerating Cloud Based Services

Enabling Real-Time Sharing and Synchronization over the WAN

Media Shuttle. Secure, Subscription-based File Sharing Software for Any Size Enterprise or Workgroup. Powerfully Simple File Movement

Cloud Computing Trends

Teridion. Rethinking Network Performance. The Internet. Lightning Fast. Technical White Paper July,

Web Caching and CDNs. Aditya Akella

Executive summary. Introduction Trade off between user experience and TCO payoff

WAN Performance Analysis A Study on the Impact of Windows 7

FatPipe Networks Network optimisation and link redundancy for satellite communications

NETWORK ISSUES: COSTS & OPTIONS

Learning Management Redefined. Acadox Infrastructure & Architecture

APPOSITE TECHNOLOGIES Smoothing the Transition to 10 Gbps. WAN Emulation Made Easy

The Next Generation of Wide Area Networking

Network performance in virtual infrastructures

HyperIP : VERITAS Replication Application Note

Zadara Storage Cloud A

SiteCelerate white paper

Archiving On-Premise and in the Cloud. March 2015

UNIFIED PERFORMANCE MANAGEMENT

networks Live & On-Demand Video Delivery without Interruption Wireless optimization the unsolved mystery WHITE PAPER

Cisco WAAS Express. Product Overview. Cisco WAAS Express Benefits. The Cisco WAAS Express Advantage

Media Exchange. Enterprise-class Software Lets Users Anywhere Move Large Media Files Fast and Securely. Powerfully Simple File Movement

Microsoft Exchange 2010 /Outlook 2010 Performance with Riverbed WAN Optimization

Optimizing Dell Compellent Remote Instant Replay with Silver Peak Replication Acceleration

TCP and Wireless Networks Classical Approaches Optimizations TCP for 2.5G/3G Systems. Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme

Backing up to the Cloud

MassTransit vs. FTP Comparison

BEST PRACTICES FOR INTEGRATING TELESTREAM VANTAGE WITH EMC ISILON ONEFS

DOCUMENT REFERENCE: SQ EN. SAMKNOWS TEST METHODOLOGY Web-based Broadband Performance White Paper. July 2015

Quantum StorNext. Product Brief: Distributed LAN Client

WAN Optimization. Riverbed Steelhead Appliances

Cisco Application Networking for IBM WebSphere

Mobile Communications Chapter 9: Mobile Transport Layer

Cloud-Delivered Software Lets Users Move Media Fast Without File Size Limits or Security Risks.

Allocating Network Bandwidth to Match Business Priorities

Truffle Broadband Bonding Network Appliance

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

WAN Optimization Integrated with Cisco Branch Office Routers Improves Application Performance and Lowers TCO

Enabling the next-generation big data ecosystem

Enterprise GIS Architecture Deployment Options. Andrew Sakowicz

How To Create A Large Enterprise Cloud Storage System From A Large Server (Cisco Mds 9000) Family 2 (Cio) 2 (Mds) 2) (Cisa) 2-Year-Old (Cica) 2.5

APV9650. Application Delivery Controller

Big data management with IBM General Parallel File System

PORTrockIT. Spectrum Protect : faster WAN replication and backups with PORTrockIT

How To Improve Your Communication With An Informatica Ultra Messaging Streaming Edition

Mobile Performance Testing Approaches and Challenges

Improving Effective WAN Throughput for Large Data Flows By Peter Sevcik and Rebecca Wetzel November 2008

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson,Nelson Araujo, Dennis Gannon, Wei Lu, and

Cisco Integrated Services Routers Performance Overview


CloudLink - The On-Ramp to the Cloud Security, Management and Performance Optimization for Multi-Tenant Private and Public Clouds

A Digital Fountain Approach to Reliable Distribution of Bulk Data

Transcription:

Transforming cloud infrastructure to support Big Data Ying Xu Aspera, Inc

Presenters and Agenda! PRESENTER Ying Xu Principle Engineer, Aspera R&D ying@asperasoft.com AGENDA Challenges in Moving Big Data Aspera Cloud Technology Use Cases Q & A

Aspera s mission! Creating next-generation transport technologies that move the world s digital assets at maximum speed, regardless of file size, transfer distance and network conditions.

Trends in Technology Big Data Explosion 90% of data today file-based or unstructured Mix of file sizes but larger and larger files the norm Diversity of IP Networks Media, Bandwidth Rates, and Conditions Variable bandwidth rates (slow to super-fast) Bandwidth rates increasing costs decreasing Network media remains diverse (terrestrial, satellite, wireless) Conditions vary all networks prone to degradation over distance Global Workflows moving Big Data over WANs Teams are geographically dispersed Over distance, network conditions degrade Contemporary TCP acceleration solutions not designed for big data transfer and replication Cloud Computing Grows Up Amazon Web Services (AWS) S3 cloud storage 2010: 262 billion objects, 2012: 1.3 trillion objects More choices: Microsoft Azure, OpenStack, HP Cloud No longer a niche Netflix (transcoding), MTV (global video distribution), BGI (genomic sequencing) 4

Underlying technology bottlenecks Transport protocols TCP and TCP variants Strongest constraint often under 10Mbps on commercial WAN Storage Not designed for single high-speed readers and writers Often constrains transfers to 100s Mbps Cloud storage slow I/O acfile system Computer architecture Commodity hardware has limitations in processing I/O interrupts Often constrains transfers to 1-2 Gbps Last Foot Bo)leneck WAN 3 1 2 5

Challenges with TCP and alternative Distance degrades conditions on all networks Latency (or Round Trip Times) increase Packet losses increase Fast networks just as prone to degradation TCP performance degrades with distance Throughput bottleneck becomes more severe with increased latency and packet loss TCP does not scale with bandwidth TCP designed for low bandwidth Adding more bandwidth does not improve throughput Alternative Technologies TCP-based - Network latency and packet loss must be low Modified TCP Improves TCP performance but insufficient for fast networks UDP traffic blasters - Inefficient and waste bandwidth Data caching - Inappropriate for many large file transfer workflows Data compression - Time consuming and impractical for certain file types CDNs & co-lo build outs - High overhead and expensive to scale

Solving the transfer bottleneck fasp TM - A reliable bulk data transport protocol that completely separates reliability and rate control Uses standard UDP in the transport layer Uses a theoretically optimized approach that retransmits precisely the real packet loss on the channel Uses separate control systems for reliability (retransmission of dropped data) and rate control Both optimized to achieve the highest possible effective rate without cost. Rate-based congestion control with packet delay (not loss) as congestion feedback Faster speed to detect incipient congestion Instead of a sliding window, packets are scheduled for sending based upon a calculated rate, governed by a rate controller Decoupling reliability and rate control New packets need not slow down for the retransferring of lost data unlike TCP Lost data is retransmitted at the available bandwidth inside the end-to-end path, with nearly zero duplicate

RATE based congestion control Network is shared resource, and each flow causes a price to others by using the shared resource Qmax q x x x Assumes each link in the network has a fixed BW t Based on the congestion and queue size, generates a price for the link (queuing delay at a router) t0 t1 Rate controller then (mathematically) drives each flow to a rate that maximizes the sum of the utilities (rate / congestion price) of all flows, without exceeding the BW capacity. When there is NO congestion, rate controller drives transfer rate towards BW with fastest speed 8

fasp : high-performance transport Maximum transfer speed Optimal end-to-end throughput efficiency Transfer performance scales with bandwidth independent of transfer distance and resilient to packet loss Automatic, full utilization of available bandwidth Optimized reliability Less than 0.1% retransmission overhead on 30% packet loss High performance with large files or large sets of small files Uncompromising security Secure, user/endpoint authentication AES-128 cryptography in transit and at-rest Resulting in Transfers up to thousands of times faster than FTP Precise and predictable transfer times Extreme scalability (concurrency and throughput) 9

fasp TM performance breakthrough FTP Across US US EU US ASIA Satellite 1 GB 1 2 hrs 2 4 hrs 4 20 hrs 8 20 hrs 10 GB 15 20 hrs 20 40 hrs Impractical Impractical 100 GB Impractical Impractical Impractical Impractical TCP transfer times limited by packet loss, delay (network distance) NOT BANDWIDTH fasp 2 Mbps 10 Mbps 45 Mbps 100 Mbps 200 Mbps 1 Gbps 1 GB 70 min. 14 min. 3.2 min. 1.4 min. 42 sec. 8.4 sec. 10 GB 11.7 hrs 140 min. 32 min. 14 min. 7 min. 1.4 min. 100 GB 23.3 hrs 5.3 hrs 2.3 hrs 1.2 hrs 14 min. Aspera transfer times shorten linearly with bandwidth Independent of packet loss, delay (network distance) Cross US Add 1% to 5% Intercontinental Add 1% to 10% Satellite Add 1% to 10%

Advanced bandwidth management Virtual bandwidth cap Virtual link : distributed rate control which allows flows to share a portion of link bandwidth while leaving the rest of bandwidth for other traffic QoS-style control without support from router or ANY hardware Preserve bandwidth for other applications (VoIP, Video) De-centralized architecture Individual flow infers aggregate traffic based on intermittent multicast Aggregate traffic is used for comparing with the preset virtual link capacity to infer a virtual queueing delay Virtual queueing delay is fed back as the congestion price which eventually governs the transfer rate Flexible control BW cap capacity changed according to schedule Through distributed multicast, flows calculate aggregate sending rate, compare the aggregate to the virtual bandwidth target value, to infer a virtual queuing delay 11

Solving the Transport Bottleneck Advanced Bandwidth Sharing Flexible bandwidth sharing Allows applications to build intentional control in their transport service Built-in response to network congestion proves a virtual handle to offer differentiated BW sharing User-specified high low priority Flows 1&2 utility function - U1(x) Flow 3 utility function - U3(x)=2U1(x). Advanced management functionality Priority changes on the fly 12

Challenges for Storing Big Files in Cloud Storage Large files are divided into chunks Typically 64 MB - 128 MB with multiple replicas distributed across the storage 1 TB file requires 10,000 chunks at 100MB per chunk! I/O protocol is HTTP only! HTTP PUT or GET by chunk SLOW over the WAN due to TCP throughput bottleneck Even local I/O is slow unless a parallel HTTP stream write/read is used, e.g. local file system drivers, e.g. S3 fuse, are notably slow, e.g. 8-10 Megabytes/s Security and access control is only as good as the application Simply no tools for inter-cloud data transfer - Lock In! Media use cases need high-speed transfer, virtually unlimited size, robust performance & security HTTP

Big data cloud storage challenge

fasp 3 New generation high-speed transport Maximized efficiency for cloud object store Small file metadata transmission moved from TCP to FASP à Maximal efficiency achieved independent of file size Parallel I/O architecture optimized for multi-core CPU Number of I/O workers configurable to fit for native storage system characteristics Multi-stage I/O processing opens door for in-line encryption and LZO compression Parallel HTTP forwarding overcomes last foot storage bottleneck Incoming FASP traffic is forwarded to internal storage via parallel HTTP API Number of HTTP streams tuned to optimize throughput performance Gbps transfer speed achieved for WAN upload/download to the cloud Infrastructure agnostic Virtual file system adapter support cloud and on-premise storage Hybrid cloud and on-premise architecture CLOUD HYBRID ON PREMISE 15

Overcoming both bottlenecks #1 TRANSFER DATA TO EC2 OVER WAN EFFECTIVE THROUGHPUT http transfer over WAN (single stream) Typical internet conditions 50 250ms latency & 0.1 3% packet loss 15 parallel http streams Aspera fasp transfer over WAN to EC2 <10 Mbps <10 to 100 Mbps up to 1Gbps (per EC2 Extra Large Instance) #2 TRANSFER DATA FROM EC2 TO S3 EFFECTIVE THROUGHPUT Standard single stream http Aspera S3 Proxy With parallel I/O http streams 10 to 100 Mbps up to 1Gbps (per EC2 Extra Large Instance) 10 TB transferred per 24 hours

Aspera On Demand High Speed Ingest With Direct-to-CLOUD Cloud

Intra-cloud Transfers ACROSS SAME OR DIFFERENT CLOUD INFRASTRUCTURE THE SOLUTION Data migration from one region to another or from one provider to another Transfer database or application logs from one region to another for DR or Business Continuity fasp Node fasp fasp Node Node US West US East

Aspera product portfolio TRANSFER CLIENTS WEB APPLICATIONS MANAGEMENT & AUTOMATION SYNCHRONIZATION High- speed transfers for web, desktop and mobile File sharing, collabora9on and exchange applica9ons Transfer management, monitoring and automa9on Scalable, mul9- direc9onal, mul9- node synchroniza9on TRANSFER SERVERS High-speed file transfer servers for on premise, private, public, and hybrid cloud deployments FASP TRANSPORT Innovative, patented, highly efficient bulk data transport technology, unique and core to all Aspera products

Aspera Developer Network ASPERA MOBILE APIs Android SDK Aspera Android SDK provides a Java API to transfer files using fasp-air. iphone SDK Aspera iphone SDK with Objective C API to transfer files using fasp-air. ASPERA BROWSER APIs Connect JavaScript API JavaScript API exposed by Aspera Connect for integration of fasp based file transfers into web applications for a complete in-browser experience ASPERA APPLICATION APIs Shares API Full programmatic control over browsing Shares, transfer authorization, and upload / download. faspex Web API A set of services that enables users to create and receive digital deliveries via a Web interface, while taking advantage of fasp high-speed transfer technology. Console API Full programmatic management of transfer sessions including initiation, queuing, management and control through a RESTful API. Aspera Web Services A SOAP based web service API that allows initiation, monitoring and controlling of fasp based file transfers. ASPERA TRANSFER APIs fasp Manager A class library that allows intiations, monitoring and controlling of fasp based file transfers. Aspera Multicast SDK A Java class library for initiation and management of IP multicast based data transmissions using Aspera fasp-mc.

Ingest and Sharing HYBRID ACROSS PUBLIC & PRIVATE CLOUDS THE SOLUTION Shares Web app transparently communicates with Aspera server Nodes and displays content in a single user interface User browses authorized content across multiple shares Independent high-speed data transfers to/ from Datacenter, AWS S3, and Windows Azure BLOB, transparent to user Client, NY, NY fasp Shares fasp DMZ Node Node Datacenter, Emeryville, CA

Automated End-to-end Workflow ACROSS HYBRID ACROSS PUBLIC & PRIVATE CLOUDS! THE SOLUTION Aspera Console Aspera Orchestrator AUTOMATE 1. Content is transferred and ingested to an on-premise Aspera server 2. Aspera high-speed transfer Direct-to-S3 with Aspera On Demand 3. Multiple parallel transcoding jobs, with output stored back to S3 4. Faspex packages are created, sent and downloaded by the customer 5. Media files are archived to Azure BLOB or AWS S3 / Glacier DMZ 3 Media Customer BLOB Transcoding Service 4 5 Media Company Faspex On Demand Aspera Node 1 S3 Glacier Aspera Node 2 INGEST TRANSFORM DISTRIBUTE ARCHIVE

THANK YOU FOR JOINING! Ying Xu Principle Engineer, Aspera R&D Questions?