Bursting to a Hybrid Cloud for Services OFC 2015
Big Data applications Big Compute in the cloud Why burst to the cloud? Opportunities 2
Big Data Apps Need Big Compute Life Sciences Bioinformatics Next Gen Sequence Analysis Video in Media & Entertainment Video Transcoding Rendering Engineering Design and Simulation Computer Aided Engineering Fluid Dynamics Simulation Analytics Hadoop Predictive Algorithms Data mining Oil & Gas Seismic Processing Reservoir Simulation Characteristics Computational intense Large datasets Increasingly distributed Processing speed matters 3
Big Compute in Life Science CardioDX performs genomic research. One of their major initiatives over the past several years was developing a predictive test that could identify coronary artery disease in its most nascent stages. To do so, researchers at the company analyzed over 100 million gene samples to ultimately identify the 23 primary predictive genes for coronary artery disease. The resulting test, known as the Corus CAD Test, was recognized as on of the Top Ten Medical Breakthroughs of 2010 by TIME Magazine. Single human genome is approximately 3 GBytes of information. (http://sandwalk.blogspot.com/2011/03/how-big-is-human-genome.html) Approximately 20,000 genes / genome = 150Kb / gene 100 Million gene samples = 15,000 TB of information 1 per 1TB internal hard drive = 1250 feet. 4
Big Compute in Life Science Burrows-Wheeler Alignment Part of genomic sequencing Prelude to pattern discernment Application Highly parallel compute Huge data set Numerous iterations Cloud based solutions Single workstation 1x Cloud cluster 324x Accelerated hybrid 2268x # Jobs per Hour 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0 Genome Mapping Application (BWA) Per Iteration Amazon AWS Hybrid Cloud 5
Big Compute in Analytics Real-time analytics Application: Real time tracking of paid TV media and the related earned digital activity across social, search & video Big Data problem: Proprietary audio and video fingerprinting identifies content 67.4k TV spots, 38.7MM airings across 103 broadcast/tv networks All in real-time Big Compute problem: Need results in minutes, not hours/days/weeks Difficult to forecast compute demand, or immediate needs of clients 6
Big Compute in Analytics Top 10 Digital Share of Voice (SOV) of the week (3/16/15) Android: Friends Furever 1.9% Digital SOV (share of voice) 1,099,154 Online Views 19,689 Social Actions $235,641 Est. TV Spend 7
Big Compute in Engineering Simulation Fluid modeling Application: Fluid dynamics simulation Big Data problem: Fine-point analysis provides more accurate and complete simulation results Big Compute problem: Fine-point analysis multiplies the complexity of the simulation exponentially. Simulation cycle time is real money to clients Solution: Cloud-based processing, network and storage for purpose-built application 8
Big Compute in Engineering Simulation Fluid modeling Yesterday Application on workstation Dataset in workstation Low utilization Compute time: 5 hours / iteration $35k on workstation Today Burst to application in cloud Dataset in private cloud On-demand compute Compute time: 30 minutes/iteration $19k in cloud 9
Bursting Why? Batch and/or lumpy application demand Short-lived projects Capex and Opex Cost Don t want to wait for IT How? Application orchestration Compute (Big Compute PaaS) Network (SDN, Openflow) Storage (Openstack, Hadoop) Big Data + Big Compute requires data movement Inside datacenter (easiest) Datacenter to datacenter (easier) Public network (cost and availability) 10
Opportunities Data movement Technically solved, operationally cumbersome Costs are high, prohibitive or highly variable (help!) Standards moving very quickly (help!) Hybrid architectures Two fundamental types of storage for big data Three fundamental types of compute Applications need optimization and abstraction (help!) Application burst orchestration (read: simple, simple, simple) Private clouds (homogenous case) Virtual private clouds (turnkey by integrator) Private / public clouds (help!) Applications drive value, not the network. Vertically integrated MSPs usually lead the way (help!) 11