Introduction to Splunk Dashboards for Service Oriented Architecture Monitoring at SurveyMonkey Michael Sela, Engineering Manager.



Similar documents
Top 10 reasons your ecommerce site will fail during peak periods

Performance Monitor. Intellicus Web-based Reporting Suite Version 4.5. Enterprise Professional Smart Developer Smart Viewer

Background. Industry: Challenges: Solution: Benefits: APV SERIES CASE STUDY Fuel Card Web Portal

Response Time Analysis

HP OO 10.X - SiteScope Monitoring Templates

CARRIOTS TECHNICAL PRESENTATION

Zend and IBM: Bringing the power of PHP applications to the enterprise

How To Use Splunk At The University Of Washington

Tier Architectures. Kathleen Durant CS 3200

A Tool for Evaluation and Optimization of Web Application Performance

Monitoring Nginx Server

Copyright 2013 Splunk Inc. Introducing Splunk 6

Holistic Performance Analysis of J2EE Applications

OTM Performance OTM Users Conference Jim Mooney Vice President, Product Development August 11, 2015

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations

Big Data Analytics. Using Splunk. Peter Zadrozny. Raghu Kodali. Apress"

Response Time Analysis

4D and SQL Server: Powerful Flexibility

Implementing Microsoft Office Communications Server 2007 With Coyote Point Systems Equalizer Load Balancing

This presentation covers virtual application shared services supplied with IBM Workload Deployer version 3.1.

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Grid CompuAng AnalyAcs with Splunk Finnbar Cunningham

SQL diagnostic manager Management Pack for Microsoft System Center. Overview

The IBM Cognos Platform for Enterprise Business Intelligence

Oracle Service Bus. Situation. Oracle Service Bus Primer. Product History and Evolution. Positioning. Usage Scenario

Minder. simplifying IT. All-in-one solution to monitor Network, Server, Application & Log Data

Web Application Platform for Sandia

Designing and Developing Microsoft SharePoint Server 2010 Applications (MS10232)

MONITORING A WEBCENTER CONTENT DEPLOYMENT WITH ENTERPRISE MANAGER

VMware vcloud Director for Service Providers

Mike Chyi, Micro Focus Solution Consultant May 12, 2010

Introduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3

ZingMe Practice For Building Scalable PHP Website. By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG

Rackspace Cloud Databases and Container-based Virtualization

<Insert Picture Here> Java Application Diagnostic Expert

This course will also teach how to create various kinds of dashboards using Reporting Services.

BIG-IP Access Policy Manager and Splunk Templates

SQL Sentry Essentials

GEM Network Advantages and Disadvantages for Stand-Alone PC

A Comparison of Oracle Performance on Physical and VMware Servers

A Modern Approach to Monitoring Performance in Production

Response Time Analysis

SAP HANA SPS 09 - What s New? Administration & Monitoring

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

EDG Project: Database Management Services

redborder IPS redborder Just common sense IPS overview Common sense

Improved metrics collection and correlation for the CERN cloud storage test framework

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

IBM DataPower SOA Appliances & MQ Interoperability

Performance Monitoring with Dynamic Management Views

Ernesto Ongaro BI Consultant February 19, The 5 Levels of Embedded BI

Cognos Performance Troubleshooting

Using Microsoft Operations Manager To Monitor And Maintain Your Farm. Michael Noel.

Embedded BI made easy

Microsoft Exam MB2-702 Microsoft Dynamics CRM 2013 Deployment Version: 6.1 [ Total Questions: 90 ]

WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE

Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks

Why Cloud BI? The 10 Substantial Benefits of Software-as-a-Service Business Intelligence

Products and Solutions

Mobile Performance Testing Approaches and Challenges

Eloquence Training What s new in Eloquence B.08.00

How To Monitor A Server With Zabbix

Cloud Computing at Google. Architecture

EZManage V4.0 Release Notes. Document revision 1.08 ( )

INTRODUCING ORACLE APPLICATION EXPRESS. Keywords: database, Oracle, web application, forms, reports

Managing and Monitoring Windows 7 Performance Lesson 8

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

Understanding Enterprise NAS

_Firewall. Palo Alto. How Logtrust works with Palo Alto Networks

Budget Event Management Design Document

KPACK: SQL Capacity Monitoring

VMware Performance and Capacity Management Accelerator Service

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Pentaho Reporting Overview

SDFS Overview. By Sam Silverberg

Yuan Fan Arcsight. Advance SQL Injection Detection by Join Force of Database Auditing and Anomaly Intrusion Detection

SQL Anywhere 12 New Features Summary

Load Balancing Microsoft Sharepoint 2010 Load Balancing Microsoft Sharepoint Deployment Guide

CS 188/219. Scalable Internet Services Andrew Mutz October 8, 2015

Top 10 Performance Tips for OBI-EE

SolarWinds Network Performance Monitor powerful network fault & availabilty management

TIBCO ActiveSpaces Use Cases How in-memory computing supercharges your infrastructure

Tuning Tableau Server for High Performance

Configuration Management of Massively Scalable Systems

Server & Application Monitor

How To Build A Connector On A Website (For A Nonprogrammer)

IBM Tivoli Composite Application Manager for WebSphere

Transcription:

Introduction to Splunk Dashboards for Service Oriented Architecture Monitoring at SurveyMonkey Michael Sela, Engineering Manager #splunkconf

Agenda Introduction! Current Applications Architecture and Challenges! Dashboards! Summary!

Mike Sela, Engineering Manager Programming computers for nearly 35 years! No, never used punch cards! Enterprise Middleware Specialist turned Data Wrangler turned Manager! Likes puppies and long walks on the beach!

SurveyMonkey At a Glance World s leading provider of web-based survey solutions! Founded in 1999! Dave Goldberg joined as CEO in 2009! Freemium business with 15 million+ customers worldwide! 2 million+ survey responses per day! 250+ monkeys! HQ in Palo Alto with offices in Portland, Seattle, Portugal, and Luxembourg!

SurveyMonkey Applications Architecture - 2010.NET! Load Balancer!! Cache! SQL Server DB!

SurveyMonkey Log Processing 2010

SurveyMonkey Current Applications Architecture

Why Would We Do This? Not Easier! Not Faster! BUT Allows us to scale as an engineering organization! Creates a SurveyMonkey platform for partners!

SurveyMonkey Log Processing Early 2012

The Log Problem Releases were blind and occurred most days! How do we monitor the health of dozens! of components?! We went from two log files to ~50! Very few engineers had production access! Engineering was last to know about problems! Did not want to code dozens of different solutions!

What Tells Me That a Component is Healthy? Volume of requests it is handling! Response time! Status codes! How do I easily find this information for all my components?

SurveyMonkey Current Architecture Most applications were similar and based on the same framework!

The Solution Splunk, obviously Gives access to applications logs securely (no more blind releases) Enables most everyone to do fancy log analysis Nginx! Configurable, robust, open-source web-server/router

SurveyMonkey Current Applications Architecture

Nginx Routes ALL requests, both front-end and back-end! Can log all sorts of metadata for each request:! Timestamp URL Duration Headers Status Referrer User agent Length, etc

Nginx.conf Snippet: Splunk-friendly Logging http {! include /etc/nginx/mime.types;!! log_format sm 'time=$time_local,! rtime=$request_time,status=$status,! addr=$remote_addr,request=$request';! access_log /var/log/nginx/access.log sm;!

Application Log Sample from Nginx time=29/aug/2013:21:17:21-0700, rtime=0.006, status=200, addr=10.10.4.8, request=post / profilesvc/v1/get_user_info HTTP/1.1! time=29/aug/2013:21:17:22-0700, rtime=0.009, status=200, addr=10.10.4.8, request=post / profilesvc/v1/get_user_info HTTP/1.1! time=29/aug/2013:21:17:23-0700, rtime=0.023, status=200, addr=10.10.4.8, request=post / profilesvc/v1/update_user_info HTTP/1.1!

Typical Daily Splunk Dashboard Content For each of my web-services, a dashboard is built with the following: Volume for each page/api Last 24 hours and a week ago Processing time for each page/api Last 24 hours and a week ago Status codes for each page/api Last 24 hours and a week ago

Example Daily Dashboard: Volume

Example Daily Dashboard: Request Time

Example Daily Dashboard: Status Codes

Splunk Dashboard Queries index="surveymonkey" source="*nginx/jobsvc*" exportjob rex field=_raw "request=(post GET) (? <page>.+) " timechart count by page! index="surveymonkey" source="*nginx/jobsvc*" exportjob rex field=_raw "request=(post GET) (? <page>.+) " timechart span=30m median(rtime) by page! index="surveymonkey" source="*nginx/jobsvc*" exportjob timechart count by status!

Dashboard XML Snippet: Easy Replication Splunk> Manager >> User interface >> Views >> DashboardJobSvc! <?xml version='1.0' encoding='utf-8'?>! <dashboard>! <label>jobsvc 24 Hour Dashboard</label>! <row>! <chart>! <searchstring>index="surveymonkey" source="*nginx/jobsvc*" exportjob rex field=_raw "request=(post GET) (?<page>.+) " timechart count by page</searchstring>! <title>call volume by endpoint - Last 24 hours</title>! <earliesttime>-24h</earliesttime>! <option name="charting.chart">column</option>! <option name="charting.chart.stackmode">stacked</option>! <option name="count">10</option>! <option name="displayrownumbers">true</option>! </chart>! </row>!!

But Wait, There s More! Every nginx log line gets stamped with the machine name: time=30/aug/2013:09:19:11-0700, rtime=0.012, status=200, addr=10.10.4.8, request=post /profilesvc/v1/get_user_info HTTP/1.1! host=sjc-pyweb09 sourcetype=syslog source=/var/log/nginx/profilesvc.access.log! Hardware statistics exist in other files: memtotalmb memfreemb memusedmb memfreepct memusedpct pgpageout swapusedpct pgswapout cswitches interrupts forks processes threads loadavg1mi! 32226 14341 17885 44.5 55.5 216778516 0.0 0 2572893770 3952532792 4558405 482 1596 0.02! host=sjc-pyweb09 Options sourcetype=vmstat Options source=vmstat Options!

Hardware Monitoring by Software Component Splunk correlates applications with hardware health Pick a timeframe Pick a component View real-time stats: Memory (free and used) Load average (~CPU) Swap And much much more

Hardware Monitoring by Software Component Splunk correlation in action for jobsvc

Hardware Query by Component XML Snippet <chart>! <title>load Average</title>! <option name="charting.chart">line</option>! <searchtemplate>index=surveymonkey sourcetype=vmstat $time$ [search index="surveymonkey" source="*nginx/$service$*" earliest=-20m dedup host table host] timechart avg(loadavg1mi) by host</ searchtemplate>! </chart>! </row>!

Summary Splunk lets me access logs while keeping production! machines secure! Generate Splunk-friendly logs from a common layer of your architecture that sees all requests (e.g. Nginx)! Use Splunk to correlate across various sources of machine data including log files to simplify monitoring and increase visibility! Generate dashboards that confirm health in seconds!

Questions?

The End