Solutions for High-speed Data Transfer: Moving Data in the Era of Big Data AIRI Petabyte Challenge Plenary Session 2 April 5, 2012
Moving Data in the Era of Big Data PRESENTER John Heaton Regional Manager of Sales Engineering john@asperasoft.com Aspera Overview AGENDA Trends and Challenges Aspera Solutions
Aspera s mission Creating next-generation transport technologies that move the world s digital assets at maximum speed, regardless of file size, transfer distance and network conditions.
Trends and Challenges Big Data Explosion 90% of data today file-based or unstructured Mix of file sizes but larger and larger files the norm Changing Computing Environment Data center consolidation of once disparate systems Move from on-premises data centers to multi-site and cloud computing architectures Data lives across heterogeneous storage platforms (file- and object-based)
Trends and Challenges (cont) Diversity of IP Networks Media, Bandwidth Rates, and Conditions Variable bandwidth rates (slow to super-fast) Bandwidth rates increasing costs decreasing Network media remains diverse (terrestrial, satellite, wireless) Conditions vary all networks prone to degradation over distance Data Freighting Challenges moving Big Data over WANs End-users and collaboration teams are geographically dispersed Over distance, conditions degrade Contemporary TCP acceleration solutions not designed for big data transfer and replication
fasp High-performance Data Transport Maximum line-rate WAN transfer speed Transfer performance scales with bandwidth independent of transfer distance and resilient to packet loss Optimal end-to-end throughput efficiency Congestion Avoidance and Policy Control Automatic, full utilization of available bandwidth On-the-fly prioritization and bandwidth allocation Uncompromising security and reliability Secure, user/endpoint authentication AES-128 cryptography in transit & at-rest Scalable management, monitoring and control Real-time progress, performance and bandwidth utilization Detailed transfer history, logging, and manifest Enterprise-Class File Delivery Transfers up to thousands of times faster than FTP/HTTP(S) Precise and predictable transfer times Extreme scalability (concurrency and throughput)
fasp High-performance Data Transport Maximum line-rate WAN transfer speed Transfer performance scales with bandwidth independent of transfer distance and resilient to packet loss Optimal end-to-end throughput efficiency Congestion Avoidance and Policy Control Automatic, full utilization of available bandwidth On-the-fly prioritization and bandwidth allocation Uncompromising security and reliability Secure, user/endpoint authentication AES-128 cryptography in transit & at-rest Scalable management, monitoring and control Real-time progress, performance and bandwidth utilization Detailed transfer history, logging, and manifest Enterprise-Class File Delivery Transfers up to thousands of times faster than FTP/HTTP(S) Precise and predictable transfer times Extreme scalability (concurrency and throughput)
Aspera Software Product & Technology Portfolio DISTRIBUTE COLLABORATE AUTOMATE Complete portfolio of servers and clients for high-speed data delivery and distribution. Global person-to-person and project-based exchange and collaboration of files and directories. Web-based application and SDK for creating and managing automated filebased workflows. TRANSPORT Our unique, patented fasp TM transport technologies provide unparalleled speed, efficiency, and bandwidth control over any size, distance, and network.
fasp Software Environment
Cloud Computing Why Is It So Compelling? THE POTENTIAL OF INFINITE COMPUTING RESOURCES, ON DEMAND Eliminates the need to plan ahead Allows companies to meet demand Without the lead-time bottleneck THE ELIMINATION OF AN UP-FRONT COMMITMENT Reduce capital outlay and investment risk Start small & increase h/w resources to match need Auto-scale to meet demand PAY-FOR-USE RESOURCE MODEL CPU s by the hour Storage by the day Bandwidth by the GB
AWS S3 s Adoption: 762 Billion Objects Now 762 Billion This represents year-over-year growth of 192% At 449 Billion objects (in July 2011) this was: 1,440 objects for every resident of the US 64 objects for each person on Planet Earth ~ the number of stars in the Milky Way http://aws.typepad.com/aws/2012/01/amazon-s3-growth-for-2011-now-762-billion-objects.html
Big Data Challenges Awareness Is Growing The Age of Big Data By STEVE LOHR Published: February 11, 2012..Most of the Big Data surge is data in the wild unruly stuff like words, images and video on the Web and those streams of sensor data. It is called unstructured data and is not typically grist for traditional databases http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html?scp=1&sq=big%20data&st=cse Obama s Big Data Plans: Lots of Cash and Lots of Open Data By DERRICK HARRIS Published: March 29, 2012 The White House on Thursday morning released the details of its new big data strategy, and they involve access to funding and data for researchers. It s a big financial commitment in a time of tight budgets well over $200 million a year but the administration is banking on big data techniques having revolutionary effects on par with the Internet, which federal dollars financed decades ago. http://gigaom.com/cloud/obamas-big-data-plans-lots-of-cash-and-lots-of-open-data/
What Constitutes Big Data
First Major Bottlenecks: Wan Transfer
Second Major Bottlenecks: Local HTTP I/O
2010: Big-data, Meet AWS S3 Dec 2010: AWS Announces Major S3 Upgrade S3 object size increased 5GB to 5TB AWS introduces multipart HTTP API API available in Java,.NET, PHP & REST Fantastic but now what? Still HTTP over the WAN (SLOW) Still have to glue any fasp high speed transfer to S3 I/O in custom s/w big speed bump! Find an expert s/w team Build upon the multipart API Concurrently stream data to S3 Integrate into operations
Solution: Aspera On-demand Direct-to-S3
Overcoming Both Bottlenecks #1 TRANSFER DATA TO EC2 OVER WAN EFFECTIVE THROUGHPUT http transfer over WAN (single stream) Typical internet conditions 50 250ms latency & 0.1 3% packet loss 15 parallel http streams <10 Mbps <10 to 100 Mbps Aspera fasp transfer over WAN to EC2 up to 1Gbps (per EC2 Extra Large Instance) #2 TRANSFER DATA FROM EC2 TO S3 EFFECTIVE THROUGHPUT Standard single stream http Aspera S3 Proxy With parallel I/O http streams 10 to 100 Mbps up to 1Gbps (per EC2 Extra Large Instance) 10 TB transferred per 24 hours
Aspera Direct-to-S3 Technology Advantages Unrivaled Aspera Performance Built on Aspera fasp technology for maximum transfer speed Regardless of file size, transfer distance and network conditions Precise bandwidth control ensures the available bandwidth is utilized to achieve maximum transfer speeds, while being fair to other business-critical network traffic Seamless integration with S3 Integrated with S3 multi-part HTTP for maximum last foot performance Simple configuration of S3 credentials, for both shared and dedicated docroot Transfers directly into S3 are seamless and transparent to user Enterprise-grade Security and Reliability Secure authentication with encryption in transit & at rest (AES-128, FIPS 140-2, HIPPA Compliant) Packet-level data integrity verification Automatic resume of partial or failed transfers Full support for AWS S3 Service-side-encryption at rest Interoperates with all Aspera host options Any platform (Windows, Linux, MAC, UNIX, ios, Android) Any Aspera Clients (CLI, Desktop, Point-to-Point, Mobile, Web, Embedded) Any Aspera Servers (Enterprise, Connect, faspex)
Aspera Software On Demand Aspera Server Universal file transfer server supports desktop, web, mobile & embedded Aspera faspex Global Person-to-person file ingest & distribution Aspera Shares Global Person-to-person file transfer & exchange Aspera Console Global transfer monitoring, reporting & control Key Features On demand high-performance data transport to and from remote infrastructures Unlimited scale out of transfer capacity with additional AMIs Support for all Aspera Server software and use cases Additional Client Options: Mobile, Outlook Plug-in & Cargo (Aspera faspex) Flexible Storage Options: Local, EBS, AWS S3 Seamlessly interoperates with on-premise Aspera deployments Integrated Management and Monitoring Applications and Use Case High Performance Computing On Demand Content Aggregation, Transformation and Distribution Time-boxed event or project-based collaboration, ad-hoc distribution or content ingest
Cloud Pipeline with Aspera On Demand Direct to S3 HTTP multipart 5 1 fasp fasp Transfer 1Gbps: 7 sec Parallel Analysis 14 instances: 3 min Client, NY, NY 1. User selects files and activates analysis pipeline 2. High-speed upload to S3 3. Download from S3 and scale out analysis 4. Deliver back to S3 5. Publish for distribution TCP 2 4 3 TCP Transfer 20 Mbps: 5.8 min NAS Herndon, VA Serial Analysis 14 Passes: 35 minutes Datacenter
Aspera On Demand Offering ASPERA ON DEMAND DIRECT-TO-S3 General Availability) (February 2012) ASPERA AMI (LINUX 64-BIT HOST) Base Server Additional Free Client options Connect browser plugin, Mobile, Embedded Console: External and Console AMI ADDITIONAL SERVER APPLICATION OPTIONS SUPPORTED Faspex (for any storage, including S3) w/ Faspex 3.0 Shares Storage, Local, EBS, and S3 S3 Credentials: Docroot w/ credentials configurable on server OR entered explicitly by client Usage based pricing model
Aspera Developer Network A complete set of SDKs provides developers with guides, reference information, and sample code to assist them with integrating Aspera technology into their own applications. Aspera fasp technology can be used in desktop, network-based, and web applications in place of FTP, HTTP, or custom TCP-based copy protocols. Aspera Transfer APIs Aspera Web Services A SOAP based web service API that allows initiation, monitoring and controlling of fasp based file transfers. Aspera Web Javascript API exposed by Aspera Connect client. It allows integration of fasp based file transfers into web applications. Connect 2.8 developer Preview 2 Introducing the new Connect 2.8 developer preview! Integrate the functionality of Aspera Connect 2.8, a faspbased file transfer client, into your own web applications, while customizing it to your unique brand. fasp Manager A class library that allows intiations, monitoring and controlling of fasp based file transfers. Aspera Multicast SDK A Java class library that allows initiation and management of IP multicast based data transmissions using Aspera fasp-mc. Aspera Mobile APIs Android SDK Aspera Android SDK provides a Java API to transfer files using fasp-air. iphone SDK Aspera iphone SDK provides an Objective C API to transfer files using fasp-air. Aspera Application APIs faspex Web API The Aspera faspex Web API provides a set of services that enables users to create and receive digital deliveries via a Web interface, while taking advantage of fasp high-speed transfer technology Other Information Supporting Tools and Libraries Supporting tools and libraries let you perform other common tasks surrounding file transfers. General Reference Reference on error codes, log file locations, configuration files and more.
Thank you for Joining! For more information on any Aspera product, please contact sales@asperasoft.com John Heaton Regional Sales Engineering Manager john@asperasoft.com
Backup Slides
Aspera Sync Overview High speed, multi-directional synchronization of remote files and directories Highly scalable designed for today s extremely large data sets Highly efficient - designed for long distance WANs Secure matches security standards set by government, FIPS 140-2 compliant Platform agnostic runs on industry standard Linux and Windows Storage agnostic compatible with any standard file or block storage system Familiar rsync-like interface shrinks the learning curve for IT professionals One-to-one, one-to-many, and full mesh synchronization Unidirectional, bi-directional; One time, scheduled, or continuous Application examples Offsite synchronization and replication for storage migration, disaster recovery and business continuity Bi-directional system mirroring for alternate access to digital content Hub and spoke sync for high-speed content collection or distribution Data migration for diverse storage environments
Aspera Sync technology advantages Built on top of fasp Overcomes TCP bottlenecks for maximum transfer speed regardless of file size, network or distance Full utilization of available bandwidth and protection of other network traffic Uncompromising security and reliability with user/endpoint authentication, AES-128 cryptography in transit and at-rest data integrity verification and automatic resume of partial or failed transfers Fastest possible resolution of file system changes Compares changes to local state (file snapshot), saving costly WAN chattiness of rsync File system notification where available (OS dependent) Superfast scan to detect changes in scan-mode for large incremental data sets Quick restart after system down Very efficient handling of common scenarios Replicates file moves and file renames on the source as a file move or rename on the target Huge savings not re-transferring after move or rename Robust - waits for the system(s) to become stable (i.e., detects whether or not files are still being modified) before performing synchronization Supports broader set of use cases and deployment options Push and Pull modes Multi-directional, full mesh synchronization
Aspera Sync performance benchmarks Performance comparison synchronizing many small files (average size 100 KB) over 1Gbps WAN of 100ms/1% Small files performance Number of files Data set size Sync time Throughput First Run Performance comparison synchronizing many large files (average size 100 MB) over 1Gbps WAN of 100ms/1% Large file performance Number of files Data set size (GB) Sync time Throughput Async 978,944 93.3 GB 9,968 sec (2.8 hours) 80.4 Mbps Async 5,194 500.1 GB 4,664 sec (1.3 hours) 921 Mbps rsync 978,944 93.3 GB 814,500 sec (9.4 days) 0.99 Mbps rsync 5,194 500.1 GB 4,320,000 sec (50 days) 0.98 Mbps Speed up difference 81x Speed up difference 940x Synchronization time after adding 31,056 files to 1 million small files (100 KB each) over 1Gbps WAN of 100ms/1% Second Run Synchronization time after adding new files to set of large files (100 MB) over 1Gbps WAN of 100ms/1% Change file performanc e Number of existing files Number of files added Total size Sync time Throughpu t Change file performance Number of existing files Number of files added Total size Sync time Throughput Async 978,944 31,056 2.97 GB rsync 978,944 31,056 2.97 GB 947 sec (16 min) 37,076 sec (10.3 hrs) 26.9 Mbps 0.68 Mbps Async 5,194 54 5.49 GB 54 sec 871 Mbps rsync 5,194 54 5.49 GB 54,573 sec (15 hrs) 0.86 Mbps Speed up difference 39x Speed up difference 1000x
Aspera software product & technology portfolio Distribute Collaborate Automate Complete portfolio of servers and clients for highspeed data delivery and distribution. Enterprise and Connect Server Universal file transfer server and web-based interface and directory listing Client and Point-to-point Uni- and bi-directional transfer clients Connect Web browser plug-in Mobile High-speed transfer for mobile devices Sync Highly scalable, multidirectional file replication and synchronization Global person-to-person and project-based exchange and collaboration of files and directories. faspex Server Secure digital delivery and collaborative file transfers with remote users and partners Web, email, mobile client options Comprehensive administration, user management & access control faspex Multi-Server / HA Automated bi-directional relays between sites 3-tier architecture with support for clustering, HA Cargo Automated package downloads Transport Web-based application and SDK for creating and managing automated file-based workflows. Orchestrator Intuitive graphical workflow designer File processing decision tree and flow Rich and flexible plug-in architecture for thirdparty process integration Comprehensive library of plug-ins for transcoding, A/V, QC, archive, notifications High volume processing Detailed dashboard, workflow, and step-level progress reporting. Open development framework for designing and integrating automation pipelines Our unique, patented transport technologies provide unparalleled speed, efficiency, concurrency and bandwidth control over any size, distance, and network fasp Patented, file-based bulk data transport fasp3 Next-gen protocol for any bulk data Aspera On-Demand S3 Direct High-speed transfer direct to cloud storage (S3) fasp-air Uploads and downloads over 3G, LTE and Wi-Fi networks fasp-mc High-speed delivery over multicast Console transport management Centralized web-based management, monitoring, and reporting
Enterprise Performance, Scalability and Reliability Unlimited scale-out faspex application and Enterprise Server can be installed on separate hosts Allows clustering of Enterprise Server nodes for unlimited scaleout of transfer capacity (I/O and network capacity) Multi-server relay Master server and Relay server configuration Transfers from one user to another are relayed to the users home server Simple administration of relay server through the master with automatic synchronization of user accounts between master and peers High-availability configuration faspex server is designed to be deployed in an active / passive HA configuration Ensures continuous availability of the faspex application Seamless automatic retry and resume of transfer sessions on failover assuming shared storage
Ad-hoc 3 rd party submission and distribution Ad-hoc submission and distribution capabilities fully integrated with e-mail, does not require a registered faspex account Upload / contribution and distribution privileges can be set to auto-expire by the administrator using time-based policies Uploads are delivered to private area with private tracking and notification Metadata entries for drop boxes are fully configurable per drop box, with metadata stored as an XML file on the server Rich reporting on metadata available through Aspera Console
Cargo Downloader automatic download with faspex Key Features Familiar ATOM feed paradigm for automatically downloading faspex packages for both Windows and Mac Supports secure embargoes on content access with built-in encryption Easy GUI controls for automatic decryption of all content in a downloaded package Configurable download target location per faspex server and account Support for a configurable number of concurrent downloads and queuing Real-time transfer rate control and monitoring for each download Pause, cancel and resume functionality with automatic retry on failure Application Examples Extends person-to-person file delivery workflows with automatic downloads of faspex packages Receive packages from multiple faspex servers, consolidating all downloads to a single client Automatically bringing in new content for follow-the-sun working with no delay
Aspera Add-in for Microsoft Outlook Key Features Leverages the strength of the Aspera fasp transport protocol and existing faspex software and infrastructure for high-speed transfer of large file and directory attachments Senders follow standard workflow to compose emails, recipients receive original email, plus faspex email Recipients can transparently download received attachments from faspex server No limits on size of email attachments - attachments can be any size with virtually no limit Entire folders and directories can be sent, without requiring any compression Offloads processing from Exchange Server Configurable size of the file attachments that trigger an Aspera transfer Application Examples Extends person-to-person file delivery workflows to the most common communication tool: email Research data distribution
Aspera faspex mobile app for ipad and iphone High-speed fasp mobile transfer app specifically designed for ios devices Native ios app look and feel Familiar faspex email-style workflow, adapted to the style of the ios email app Fully integrated with all other faspex clients Integrated with fasp and faspex security models, encryption at rest enabled via simple toggle Sending and receiving video and imaging fully integrated with Photo Gallery
Universal ios app, fully integrated with Photo Gallery A single app for both ipad and iphone devices Dynamically adjusts taking specific advantages of each device s foot print and characteristics The app to make sense to your users regardless of which device type they are on Fully integrated with the Photo Gallery Downloaded picture and video packages can be stored directly in the Photo Gallery with a single click Photos and videos can be accessed directly from the Photo Gallery and sent as standard faspex packages Users can shoot pictures and video with the built in camera and immediately upload them for high-speed delivery
Comprehensive Security Full support of the Aspera fasp TM security model Secure authentication, encryption of the data using strong cryptography, per-packet integrity verification to protect against man-in-the-middle compromise, and is FIPS-140 2 compliant Supports all LDAP directory services for import, synchronization, and direct authentication Open Directory, Open LDAP, Active Directory Configurable security options Automatic deactivation after set number of failed login attempts Concurrent login prevention, session timeout, strong passwords Configurable per IP and mask upload, download, login permission Sending and receiving from 3 rd party non registered users, with policy-based expiration Configurable data encryption in flight and at rest Package contents can be encrypted over-the-wire, or optionally at rest for complete security Encrypted packages can be decrypted on the fly when downloaded by the recipient using one of the faspex clients such as the ios faspex Client, Aspera Add-in for Outlook, Aspera Connect browser plug-in, or Aspera Cargo downloader The sender may choose when and to whom to distribute the secret, and thus prevent unauthorized users from accessing the content, and control when recipients are able to decrypt the downloaded content
Use Case: Disaster Recovery / Business Continuity Goal - Protection against unplanned outages or data loss Replicate mission critical data from a primary site to one or more alternate sites Systems remain available after a system outage or site loss Critical data is preserved Examples Complete system replication for disaster recovery and data center migration Key Features and Benefits Fastest possible end-to-end sync (fasp TM ), replication fits within the operational window, and minimizes recovery time meeting both the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) High scalability allowing protection of large files and file systems Continuous, ultra-efficient synchronization ensuring offsite systems are always up to date Command Line Interface for custom scripting to meet specific requirements Console Monitoring to ensure system is operational Primary Site Alternate Site
Use Case: Content Distribution / System Mirroring Goal Mirrored systems offering alternate access to content Replicate digital content from one system to another Content is always available, offering high availability of the service Examples Digital Content distributed through a commercial subscription service Key Features and Benefits Fastest possible transport (fasp TM ), ensures content is quickly distributed and readily available High scalability allowing synchronization of large files and file systems Continuous Synchronization through OS notification ensuring offsite systems are always up to date Web Application Content Upload Primary Site Alternate Site
Use Case: Hub and spoke for collection or distribution Goal Automatic collection or distribution of content to / from a centralized hub Automatically replicate digital content from one system to another Content is always available, offering high availability of the service Examples Daily collection or distribution of data across a large network of end points Continuous update of online content, software and media Distribute periodic content updates to remote locations or offices Key Features and Benefits Instant recognition of file system changes (new, deleted, renamed content) Fastest possible transport (fasp TM ), ensures content is quickly distributed and readily available Highly scalable architecture supports many concurrent sync sessions in push, pull or bi-directional configuration Centralized Server
Use Case: Data migration for diverse storage environments Goal Replicate data across different storage platforms Replicate entire file system, or portions of the file system Preserve file attributes such as permissions, access times, ownership, etc. Examples Migrate data from legacy to new storage system Support multiple host operating systems across diverse storage platforms Key Features and Benefits Hardware and system agnostic, able to replicate across disparate platforms and systems High-speed over the WAN, can complete data migration in a fraction of the time of proprietary systems using traditional protocols Highly scalable allowing timely migration of very large files and file systems Existing Storage Newly Provisioned Storage
Common Cloud Use Cases DATA PROCESSING & CONTENT CREATION Compute Intensive: EC2 (10 s, 100 s, 1000 s of CPUs) Big-data analytics & HPC STORAGE FOR ARCHIVE & D/R B2B / B2C data ingest & distribution Offsite storage for disaster recovery, business continuity, long term storage for archive DATA & CONTENT DISTRIBUTION Collaborative data exchange CDN and global delivery