Load Testing at Yandex. Alexey Lavrenuke

Similar documents

How To Test A Web Server

New Relic & JMeter - Perfect Performance Testing

Table of Contents INTRODUCTION Prerequisites... 3 Audience... 3 Report Metrics... 3

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

Performance Testing. Why is important? An introduction. Why is important? Delivering Excellence in Software Engineering

1 How to Monitor Performance

Performance TesTing expertise in case studies a Q & ing T es T

How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations

A Comparison of Oracle Performance on Physical and VMware Servers

Network Infrastructure Services CS848 Project

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Automated Performance Testing of Desktop Applications

GSX Monitor & Analyzer. for IBM Collaboration Suite

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Case Study - I. Industry: Social Networking Website Technology : J2EE AJAX, Spring, MySQL, Weblogic, Windows Server 2008.

depl Documentation Release depl contributors

DNS (Domain Name System) is the system & protocol that translates domain names to IP addresses.

Performance Testing Process

Tool - 1: Health Center

A Talk ForApacheCon Europe 2008

Tableau Server 7.0 scalability

Development at the Speed and Scale of Google. Ashish Kumar Engineering Tools

Scalability Factors of JMeter In Performance Testing Projects

SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems. Ed Simmonds and Jason Harrington 7/20/2009

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic

Benchmark Performance Test Results for Magento Enterprise Edition

Architecting ColdFusion For Scalability And High Availability. Ryan Stewart Platform Evangelist

Improved metrics collection and correlation for the CERN cloud storage test framework

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage

1 How to Monitor Performance

Load Balancer Comparison: a quantitative approach. a call for researchers ;)

Getting Started with SandStorm NoSQL Benchmark

A Comparison of Oracle Performance on Physical and VMware Servers

G DATA TechPaper #0275. G DATA Network Monitoring

MMLIST Listserv User's Guide for ICORS.ORG

Testing Big data is one of the biggest

27 th March 2015 Istanbul, Turkey. Performance Testing Best Practice

RTI v3.3 Lightweight Deep Diagnostics for LoadRunner

Analyzing large flow data sets using. visualization tools. modern open-source data search and. FloCon Max Putas

Databricks. A Primer

ArcGIS Enterprise Systems: Designing, Testing and Monitoring

YouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, Anthony Yeh, Software Engineer, YouTube.

Glassbox: Open Source and Automated Application Troubleshooting. Ron Bodkin Glassbox Project Leader

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

Platform as a Service and Container Clouds

PEPPERDATA OVERVIEW AND DIFFERENTIATORS

The Big Data Paradigm Shift. Insight Through Automation

Streamdrill: Analyzing Big Data Streams in Realtime

Tableau Server Scalability Explained

So in order to grab all the visitors requests we add to our workbench a non-test-element of the proxy type.

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

WhatsUpGold. v3.0. WhatsConnected User Guide

Building a large scale CDN with Apache Trafficserver. Jan van Doorn jan_vandoorn@cable.comcast.com

Installing and Configuring Windows Server Module Overview 14/05/2013. Lesson 1: Planning Windows Server 2008 Installation.

SOA, case Google. Faculty of technology management Information Technology Service Oriented Communications CT30A8901.

Introduction. AppDynamics for Databases Version Page 1

Open Source and Commercial Performance Testing Tools

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

Sparx Systems Enterprise Architect Cloud-based repository hosting

Migrating to vcloud Automation Center 6.1

Mitel Professional Services Catalog for Contact Center JULY 2015 SWEDEN, DENMARK, FINLAND AND BALTICS RELEASE 1.0

SCALING USER-SESSIONS FOR LOAD TESTING OF INTERNET APPLICATIONS

Performance Test Process

Building, testing and deploying mobile apps with Jenkins & friends

XpoLog Center Suite Log Management & Analysis platform

1 Log visualization at CNES (Part II)

Databricks. A Primer

Big Data and Analytics: Challenges and Opportunities

Performance Workload Design

IERG 4080 Building Scalable Internet-based Services

Building a Continuous Integration Pipeline with Docker

COMP9321 Web Application Engineering

THE STATEFUL CONDITION: OR HOW I LEARNED TO STOP WORRYING AND EMBRACE THE CLOUD

WHITE PAPER Redefining Monitoring for Today s Modern IT Infrastructures

Load and Performance Load Testing. RadView Software October

CRITEO INTERNSHIP PROGRAM 2015/2016

Linux A first-class citizen in Windows Azure. Bruno Terkaly bterkaly@microsoft.com Principal Software Engineer Mobile/Cloud/Startup/Enterprise

Mark Bennett. Search and the Virtual Machine

COMPARING NETWORK AND SERVER MONITORING TOOLS

DevOps with Containers. for Microservices

How To Set Up Wiremock In Anhtml.Com On A Testnet On A Linux Server On A Microsoft Powerbook 2.5 (Powerbook) On A Powerbook 1.5 On A Macbook 2 (Powerbooks)

The Complete Performance Solution for Microsoft SQL Server

Parallels Cloud Server 6.0

Dry Dock Documentation

IP Address Management: Smoothing the Way to Cloud-Based Services

Tuning Tableau Server for High Performance

Varnish the Drupal way

CloudLinux is a proven solution for shared hosting providers that:

Throughput Capacity Planning and Application Saturation

Performance Testing of a Large Wealth Management Product

LDIF - Linked Data Integration Framework

Performance Analysis of Web based Applications on Single and Multi Core Servers

Transcription:

Load Testing at Yandex Alexey Lavrenuke

Load Testing at Yandex What is Yandex

Yet another indexer

Yet another indexer

Yandex

Yandex s mission is to help people discover new opportunities in their lives. 8

Yandex s mission is to help people discover new opportunities in their lives. Russian search engine Started at 1997 Most popular search engine in Russia 8

Load Testing at Yandex Why do we need Load Testing

Once we accept our limits, we go beyond them. Albert Einstein* * some say Einstein didn't say that but the expression is still beautiful

DevOps Developers have never faced with performance problems before production Sysadmins do not know service's architecture Performance problems highlighted during load testing encourages people to work together solving them 11

Different kinds of problems Stateless services. High throughput and velocity (banner system) Scenario-based services (mail) Different protocols (inter-service communication) Batch systems (statistical services) 12

Load Testing at Yandex Our main tool

Yandex.Tank: a ten-years history Better software is produced by those forced to operate it* phantom is a very fast web-server phantom-benchmark is a plugin for it that acts as a client. And it is also fast Yandex.Tank is built around phantombenchmark. But today it is even more capable * quoted from Theo Schlossnagle's "Operational Software Design" talk announce 14

Yandex.Tank today Yandex.Tank is an opensource project Primary language is Python Default load generator is phantom (C++) JMeter support 15

Yandex.Tank's internal architecture Yandex.Tank is a meta-tool Tank provides common workflow for different load generators Generator should create sufficient load and measure response characteristics Tank has modular design By Dave Hakkens [CC BY-SA 3.0], via Wikimedia Commons 16

Load Testing at Yandex Use Yandex.Tank

Load testing environment at Yandex Cloud of tanks Cloud of targets A service that stores test results DevOps, again: enable developers to perform load tests with minimum efforts 18

Use cases Stateless: phantom + Yandex.Tank Scenario-based: JMeter + Yandex.Tank Different protocols: JMeter, phantom in some cases or custom solutions + Yandex.Tank Batch systems: not using Yandex.Tank. Will discuss them later 19

How to install it Phantom Jmeter Yandex.Tank 20

How to install it Ubuntu PPA Phantom Python package Jmeter Yandex.Tank 20

How to install it Ubuntu PPA Phantom Ubuntu PPA From sources Python package Jmeter Yandex.Tank 20

How to install it Ubuntu PPA Phantom Ubuntu PPA From sources Python package Jmeter Any way you want Yandex.Tank 20

How to install it Docker repository Ubuntu PPA Phantom Ubuntu PPA From sources Python package Jmeter Any way you want Yandex.Tank 21

Configure your first load test.ini files good defaults redefine defaults on multiple levels it is rather easy to use Yandex.Tank to make automated load tests 22

Configure your first load test.ini files good defaults redefine defaults on multiple levels it is rather easy to use Yandex.Tank to make automated load tests 22

Configuration example, load.ini Section header: for each plugin [phantom] 23

Choose target Target address: IP, IPv6 or domain name [phantom] address = my.service.com 24

What do we send Ammo in one of possible formats [phantom] address = my.service.com uris = / /mypage.html /clck/page?data=hello headers = [Host: example.org] [Accept-Encoding: gzip,deflate] 25

Load type Add schedule [phantom] address = my.service.com uris = / /mypage.html /clck/page?data=hello headers = [Host: example.org] [Accept-Encoding: gzip,deflate] rps_schedule = const(1, 40s) 26

First test Save config as load.ini and shoot a smoke test: just some shoots yandex-tank -c./load.ini 27

Load Testing at Yandex Test types

Open and closed systems Closed system Open system Closed systems have negative feedback that makes it impossible to "bury" the service. Users wait for the responses before making new requests No negative feedback in open system. Internet is an open system 29

Finding max performance Closed model, gradually raise thread count each thread sends requests one-by-one yandex-tank -c./load.ini -o "phantom.instances_schedule=line(1, 40, 10m)» -o "phantom.rps_schedule=" 30

Times distribution graph 31

Times distribution graph RPS times distribution (each second) thread count (schedule) 31

Finding max performance on times dist graph response times distribution graph 32

Finding max performance on times dist graph response times distribution graph 32

Finding max performance on times dist graph linear growth response times distribution graph 32

Finding max performance on times dist graph top performance linear growth response times distribution graph 32

Finding max performance on times dist graph top performance linear degraded! growth response times distribution graph 32

Finding point of failure Open system. Hard schedule emulate open system with multitude of threads yandex-tank -c./load.ini -o "phantom.rps_schedule=line(1, 1000, 10m)" -o "phantom.instances=10000" 33

Response times quantiles graph 34

Response times quantiles graph response times load schedule (expected RPS) quantiles 34

Finding point of failure on RT quantile graph response times quantiles 35

Finding point of failure on RT quantile graph imbalance started response times quantiles 35

Finding point of failure on RT quantile graph imbalance started RPS of failure response times quantiles 35

Measuring response times Open system, constant load. Load level from SLA or from previous tests don't forget about warming up yandex-tank -c./load.ini -o "phantom.rps_schedule=line(1, 300, 30s) const(300, 5m)" -o "phantom.instances=10000" 36

Mind your spikes spikes on quantile graph 37

Mind your spikes spikes on quantile graph 37

Common spikes reasons "Heavy" requests in your ammo. See if spikes become more often on higher load Periodical processes on your server. Cron job or cache synchronization. Garbage collector Someone else queries your server periodically 38

Investigating reasons for spikes Service downloads something periodically: memory consumption network send/receive 39

Investigating reasons for spikes Service downloads something periodically: memory consumption network send/receive 39

Finding leaking resources Open system, constant load, long period set your load level at 80-90% from maximum level you've found before yandex-tank -c./load.ini -o "phantom.rps_schedule=line(1, 700, 30s) const(700, 1h)" -o "phantom.instances=10000" 40

Testing methodology Smoke test. Ensure your system is working and you have all the metrics needed Performance test. Closed system, growing number of threads Imbalance test. Open system, hard schedule, linear growth Measuring timings. Open system, constant load level from SLA Find leaking resources. High load level. Any other test you need. 41

Load Testing at Yandex Approaching batch systems

Machine learning system The Ordinary people machine codependent filters with history An interesting person Event log Mark relevant events 43

Machine learning system Mark relevant events 44

Machine learning system Billing Statistics Mark relevant Targeting events 44

Living with a batch system Millions of events in one log. Thus, no RPS Log sizes varied over the day. Can't change load level. Can't make it "flat" or linear Can't use Yandex.Tank or any other load generator. Still want to see performance limits and trends 45

Why having X2 server is never enough Assumption: if we have an X2 server and it works then we have enough time to do something before we reach our performance limit in production But how much time do we have and what exactly should we do? 46

First step: collecting metrics We collected different kind of metrics from all our servers. Some of them were useful in understanding the results of experiments But they work only if the change is instant and big enough 47

Problem: too many metrics There are a lot of metrics. Really a lot And we don't know for sure which of them are the reasons and which are the consequenses Lower the number of dimensions 48

Solution: scatter plots Uncorrelated Linear dependency 49

What scatter plot can tell us Non-linear tail Outliers Non-zero process time for zero-sized logs 50

Compare observations 51

Compare observations Test vs Prod New release vs current Period in the past with today 52

Find trends Basic idea: build linear model on each release and compare the coefficients But the data is too noisy and the model is unstable 53

Making it better Clean the dataset by using density-based clusterization Use points from the biggest cluster to build the linear model Investigate the reasons for outliers and smaller clusters 54

Dig deeper: components dependencies The Ordinary people machine codependent filters with history An interesting person Event log Mark relevant events 55

Dig deeper: components dependencies Ordinary people codependent filters with history An interesting person Event log Mark relevant events 56

Dig deeper: components dependencies Investigated component dependencies Extracted data flow from code Converted them into graph diagram Found critical path: the longest path in that graph 57

Critical path visualization Collected data about each component work times and wait times for each processed log Visualize them on critical path Now we can find bottlenecks and see if they migrate in new releases 58

Testing the batch system Learn about the architecture Collect a lot of metrics. Write tools to collect additional metrics Find correlations Automate trend detection Find the critical path Investigate outliers 59

Load Testing at Yandex Summary, links and contacts

What did we learn today Yandex.Tank: a universal load testing tool Load testing methodology How to approach batch systems 61

Useful links Our chat room: gitter.im/yandex/yandex-tank About Yandex.Tank project: yandex.github.io/yandex-tank Yandex.Tank on github: github.com/yandex/yandex-tank Yandex Tank API on github: github.com/yandex-load/yandex-tank-api phantom on github: github.com/mamchits/phantom Read the docs on ReadTheDocs: yandextank.readthedocs.org 62

Contacts Alexey Lavrenuke testing engineer direvius@yandex-team.ru @direvius, #yandextank

Let s go beyond our limits!