Migrating 200+ Load Balancers Into Mesos Stephen Salinas - @shsalinas2012
Load Balancing
Marketing Software Sales Software
PaaS At HubSpot Over 150 Engineers Over 2000 deployables ~350 daily deploys to prod
api.hubapi.com/email Load Balancer
Load Balancer Updates Write Config Files ye_old_deploy.py Start Service Load Balancer
Mesos Master api.hubapi.com/email Load Balancer Scheduler
Mesos Master /email /meetings /reports api.hubapi.com/email /reports /email /meetings Scheduler
Mesos Master /email /meetings /reports api.hubapi.com/email /reports /email /meetings Baragon API Load Balancer Updates Singularity Scheduler
The Two-Phase Commit host, port, path, domain, extra config Find relevant agents Check for conflicts Result Trigger update on agents Singularity Scheduler Baragon API Respond to Singularity Hold old tasks until they are removed from the LB Don t consider new tasks successfully started unless adding to LB succeeds Response Apply complete request data Write config file from a template Valid? Return success or revert and api.hubapi.com somethingelse.com api.hubapi.com return a failure
Mesos Master /email /meetings /reports api.hubapi.com/email /reports /email /meetings Baragon API Load Balancer Updates Singularity Scheduler
Mesos Master /email /meetings api.hubapi.com/email ELB /reports /reports /email /meetings Baragon API Load Balancer Updates Singularity Scheduler
api.hubapi.com/email app.hubspot.com forms.hubspot.com >45 groups signup.hubspot.com nav.hubapi.com >200 separate servers website.grader.com leadin.com library.hubspot.com
Mesos Master /email /meetings api.hubapi.com/email ELB /reports /reports /email /meetings Baragon API Load Balancer Updates Singularity Scheduler
Mesos Master /email /meetings /reports api.hubapi.com/email ELB /reports /email /meetings Singularity Scheduler Load Balancer Updates Baragon API
Why Run LBs in Mesos? Reliability Scalability Easier deployment and upgrades Save $$$
Challenges Running LBs in Mesos Dependencies Logging Disposable instances Bootstrapping No manual touch needed Service Discovery Nginx s Nginx ELB
Dependencies start.sh Build Nginx + Baragon Agent into a Docker image render basic configs from environment Run a container with just enough information to bootstrap Where is Baragon Service? What group am I in?
Logging container task sandbox access.log Log Tailer
Service Discovery
Service Discovery Each group has a dedicated port Tell the ELB where Nginx is on startup Ensure removal from the ELB before shutting down ELB
Startup 1. Phone home with group name 2. Most recent configs for group 3. Agent applies and checks configs Service 4. Ready to serve traffic Agent 5. Add to ELB 6. Traffic starts flowing to new LB ELB 7. Continually ensure ELB in sync with active agents
Shutdown 1. KILL or other signal sent to agent 2. Send shutdown notification Service 3. Attempt to remove from ELB Agent 4. start.sh checks for successful removal ELB 5. If not removed, remove health check html file and wait 6. Agent container stops Nginx
Problems ELB API rate limiting and ELB inconsistencies Docker + DNS lookups Hard memory limit
Benefits Scale with 1 click Automatic replacement of lost LBs by the scheduler Mesos Master < 20 seconds to launch new LB api.hubapi.com/email Condense 200+ EC2 instances to <10, saving ~$24,000/month ~15 man hours saved every Nginx upgrade Singularity Scheduler Load Balancer Updates Baragon API
Check It Out! http://github.hubspot.com/baragon/ http://getsingularity.com @shsalinas2012