Deterministic capacity planning for OpenStack Keith Basil Principal Product Manager, Red Hat Sean Cohen Principal Product Manager, Red Hat Tushar Katarki Principal Product Manager, Red Hat
http://sharpwriter.deviantart.com/art/welcome-to-the-internet-please-follow-me-322248378 http://creativecommons.org/licenses/by-nc-nd/3.0/ devops headband, BOFH Slayer gun handle and OpenStack unicorn branding added for effect. Not for redistribution.
AGENDA OpenStack as an Elastic Cloud Determinism in Infrastructure Compute for Elastic Clouds Storage for Elastic Clouds Networking for Elastic Clouds Putting It All Together
Keith Basil personal Virginia hare scrambler, plays chess.. professional Red Hat Cloudscaling, Time Warner Cable, FederalCloud.com, Cisco and a couple of startups blended skype/twitter/github/irc, life: noslzzp
Sean Cohen personal Jazzman, oil painting & tennis... professional Red Hat Dot Hill Systems, Cloverleaf Communications, VerticalNet blended skype: sean.redhat, irc: scohen
Tuskar Katarki personal Two kids and the wife, squash, hike/bike professional Red Hat 15 years in IT infrastructure development Sun Microsystems, Oracle
H E L L O my name is OpenStack Hello.. I m Your Elastic Cloud.
OpenStack... Is open source software and vibrant community Provides a framework for an elastic cloud Benefits from deterministic deployment approaches
Elastic Cloud!= Enterprise Virtualization Elastic Cloud Workloads Enterprise Virt Workloads Applications expect failure Workloads NOT designed to tolerate failure Smaller stateless VMs Larger stateful VMs Applications scale out horizontally with VMs of predetermined capacity Workloads scale up within custom VMs (more vcpu, vram) Lifecycle measured in hours to minutes Lifecycle measured in years Scale Up Scale Out - Servers are like cattle. - Servers are like pets.
Difference in the resource requests? 8) I want an m1. please I want 6 vcpus, 4 GB and 120Gb disk please. 8) One is user determined. One is provider determined.
OpenStack in 2 Minutes! Hey Glance, can I get the RHEL 6.4 image? It s rendering time! 8) Thank you OpenStack!! 8) I would like an m1.medium VM please! Swift Glance Node Umm, Do I know you? I need to see some papers!! Indeed I do. Don t forget to mount it! capacity capacity Keystone Cinder, have that volume ready for me? instance capacity Nova Papers are good. Time to get to work! Node Here s your IP, default route and FW settings. Nova Ok, we need to find a place to build this VM. Tag - you re it! Neutron Neutron, I need a network with all the trimmings! Node
Your Mission, Should You Chose to Accept It.. If you re going to do operations reliably, you need to make it reproducible and programmatic. - Mike Loukides What is DevOps? Applications are what matter. Anything that gets apps deployed faster and helps companies manage the proliferation of apps is good. Hence, DevOps. - Mark Imbriaco VP of Ops, Digital Ocean
The goal is to keep your devops heroes in play! http://sharpwriter.deviantart.com/art/welcome-to-the-internet-please-follow-me-322248378 http://creativecommons.org/licenses/by-nc-nd/3.0/ devops headband, BOFH Slayer gun handle and OpenStack unicorn branding added for effect. Not for redistribution.
Determinism in Infrastructure
Let's Break The Myth... There is no such thing as infinite scale in cloud computing All computing requests, even for virtualized resources, ultimately map to physical device > finite resources
Capacity Planning in a the Cloud Every provider has limits, even if they re massive. Adding the word Cloud simply squeezes the limit balloon It doesn t eliminate the issue, even with elasticity. The service provider is responsible for risk mitigation of the capacity it rents.
Infrastructure as building code
Why History matters.. Capacity planning and performance monitoring in the context of Public providers: Can be done only by understand the history of a specific cloud provider. Requires both cloud performance application to understand Current state of the provider Performance history over a given period of time.
8^) 8^) Implicit contract Cloud tenants have a service level expectation Cloud Operators have business constraints 8^) Unicorns 8^) uid=0 8^) Unicorns 8^) RULE! 8^) RULE! Operators Implicit Contract 8^) BOFH RULE! Slayer! 8^) 8^) # root Unicorns RULE! devops FTW! Operator Tenants
8^) 8^) 8^) Capacity Planning in the Cloud Cloud users buy services based on capacity, protected by SLA Cloud provider need deterministic capacity planning to support the elastic growth 8^) 8^) Unicorns uid=0 8^) Unicorns RULE! RULE! 8^) Operators Implicit Contract 8^) BOFH RULE! Slayer! 8^) 8^) # root Unicorns RULE! devops FTW! Operator Tenants
Deterministic Capacity Planning Determinism is the best measure we have for predicting the effort and expense of making a process consistently performant When your service becomes a critical part of a customer s infrastructure, their fate becomes wedded to the SLA s you deliver. In Cloud Computing, the service s performance will not be measured by its average speed but by the consistency of its speed
Modeling Performances Using this information, we re able to more accurately determine the capacity of a Public provider Monitoring performance spikes and valleys over time. This means we can more accurately model for performance, and thus capacity.
Benchmarks can provide useful insight for performance analysis and capacity planning http://cloudharmony.com/benchmarks
Deterministic Concepts & Goals AWS and GCE as models You want 2048, not Tetris Scheduling made easy Scaling made easy Optimal hardware use (no holes or hot spots) Performance consistency
How do we achieve determinism for these core OpenStack services?
Compute for Elastic Clouds
Solving resource contention in Compute CPU Compute Instance Family Memory Disk
Public Cloud VM Instances Exposed! n1-standard-8 xlarge m1.xlarge 1/1 n1-standard-4 large 1/2 m1.large n1-standard-2 n1-standard-1 n1-standard.class medium 1/4 1/8 m1.medium m1. m1.class
We can take this approach with OpenStack We can easily derive the entire instance family because er instances are fractional proportions of the largest. This facilitates efficient hardware use and scheduling. Solve for the biggest VM in the class 1/1 1/2 1/4 1/8 xlarge large medium
Efficient Bin-Packing with Fractional Proportions Given the machine config below, it would support: large medium medium medium medium xlarge (4) n1-standard-8-d large (8) n1-standard-4-d (16) n1-standard-2-d (32) n1-standard-1-d xlarge xlarge large xlarge xlarge (8) m1.xlarge (16) m1.large (32) m1.medium (64) m1. Compute Hardware Node (general compute instance family) 128GB memory, (16) 1TB disks, (2) E5-2670 CPU
Efficient Scheduling with Fractional Proportions General Purpose Instance Families n1-standard m1 A1 - A4 Memory Optimized Instance Families n1-highmem m2,cr1 A5 - A7 CPU Optimized Instance Families n1-highcpu c1,cc2,c3 large medium medium medium medium scheduling medium medium scheduling large medium medium scheduling large large medium medium xlarge xlarge xlarge xlarge medium medium large GENERAL COMPUTE NODE MEMORY OPTIMIZED NODE CPU OPTIMIZED NODE
Compute Calculator Intro Designed to help determine optimal compute hardware configurations Visually shows resource constraints Allows custom instance families Walk through
Storage for Elastic Clouds
Solving resource contention in Block Storage Throughput Block Storage Volume Types Performance (IOPS/latency) General Storage
What Are the Public Clouds Doing with Storage? Capacity Optimized (standard) no IOPS guarantees workloads with moderate IO Billed by size and IO usage Performance Optimized guaranteed IOPS (SSDs) IOPS per GB with low latency for I/O intensive workloads Billed by size and IO usage Blended Approach (Performance Scaled with Capacity) Ephemeral disks deprecated! IOPS scale with volume size Attached volume limits Billed by size only
Block Storage Classes in OpenStack Performance Optimized Storage all SSDs Throughput Optimized Storage fast SAS drives with RAID 5/6 throughput tuned network high bandwidth Internal bus Capacity (General) Optimized Storage larger SATA s SSD SSD SSD SSD Cinder scheduling SSD SSD SSD SSD SSD SSD SSD SSD Cinder scheduling Cinder scheduling SSD SSD SSD SSD PERFORMANCE OPTIMIZED STORAGE NODE THROUGHPUT OPTIMIZED STORAGE NODE GENERAL STORAGE NODE
8^) Storage Tiers with OpenStack Cinder 1. Define storage back ends 3. Create Volumes 8^) Operators # cinder create \ --volume_type IOPS_OPTIMIZED_TYPE \ --display_name volume-1 50 RULE! OPERATOR 2. Create Volumes Types General Performance Throughput TENANT
Capacity (General) Optimized Storage Raw capacity of the storage Replication RAID type 2-Way 3-Way RAID TYPE Replication Replication RAID5 2.2 3.3 Example: RAID6 2.4 3.6 RAID10 4 n/a Twelve (12), 1TB disks, configured for RAID6 and 2-way replication would yield 5.0TB of usable capacity. 12TB / 2.4 = 5.0TB net usable capacity.
Performance Optimized Storage IOPS scale linearly with VM count Limits should be seen as triggers for storage scale out Write Latency READ Latency
Throughput Optimized Storage Throughput response matters The Read/Write mix matters Influenced by RAID type
Storage Planning Step 0: What is my Cloud Storage offering? Capacity Based Performance (IOPS) Based Throughput (Bandwidth) Based Step 2: Storage Capacity Planning Workload projections Performance Observations, Metrics to be optimized, and Calculators Step 1: What Storage Tiers do I need? Step 3: Procure and Deploy Capacity Optimized, Performance Optimized, Throughput Optimized Step 4: Manage and Steer Schedulers 41
Networking for Elastic Clouds
Solving resource contention for the Network Throughput Core Network Latency Resiliency
Enterprise vs Cloud Fabric Traditional Enterprise Topology Modern Cloud Friendly Topology Network diagrams referenced from http://cto.vmware.com/is-your-cloud-ready-for-big-data/
Network Elasticity is Required.. NODE NODE NODE NODE NODE NODE NODE NODE BLOCK STORE BLOCK NODE STORE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE BLOCK STORE BLOCK STORE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE BLOCK BLOCK NODE NODE NODE NODE NODE NODE NODE STORE STORE NODE NODE NODE BLOCK BLOCK NODE NODE NODE NODE NODE NODE NODE STORE STORE NODE NODE NODE Elastic Cloud Resource Map
Because your cloud will grow.. Each unit here could be a server, or a rack of servers.
Core Fabric Requirements OpenStack friendly networking features: Availability and Resiliency (multi-path, per-flow routing) Resource Node (compute/storage) Data Throughput Network Latency Congestion Management
Spine and Leaf Topology Ask your friendly network vendor for guidance Cisco, ARISTA, Brocade, Juniper, Force10, etc. http://bradhedlund.com/2012/01/25/construct-a-leaf-spine-design-with-40g-or-10g-an-observation-in-scaling-the-fabric/
Putting it All Together
Remember our Hero!
Plan for the Resource Service Level Network Fabric Compute/Storage Resource Service Level Cloud Controller
High level architecture Deterministic Network{ OpenStack Core Services{ Deterministic } Resources Core services General Purpose Compute Performance Storage General (Capacity) Storage Scale Out (as needed)
Questions?
Resources https://github.com/noslzzp/ cloud-resource-calculator What is DevOps? http://oreil.ly/1jbcsau - free! Open source tools includes: Graphite Ganglia Public Clouds Benchmarks Cloudharmony.com Cloudsleuth.com (Global Provider View)
Red Hat Enterprise Linux OpenStack Platform High Availability Arthur Berezin Technical Product Manager, Red Hat Wednesday, April 16 2:30 pm - 3:30 pm Thank You! Deploying Red Hat Enterprise Linux OpenStack Platform in the enterprise with FlexPod Arthur Enright Field Product Manager, Red Hat NetApp and Cisco Wednesday, April 16 3:40 pm - 4:40 pm Check out these sessions! Deep dive: OpenStack Compute Steve Gordon Technical Product Manager, Red Hat Thursday, April 17 9:45 am - 10:45 am