Machine Data Analytics with Sumo Logic

Similar documents
Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service

Meeting the Challenge of Big Data Log Management: Sumo Logic s Real-Time Forensics and Push Analytics

The Sumo Logic Solution: Security and Compliance

WHITE PAPER. Five Steps to Better Application Monitoring and Troubleshooting

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

XpoLog Center Suite Log Management & Analysis platform

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Violin Symphony Abstract

Big Data. Fast Forward. Putting data to productive use

XpoLog Center Suite Data Sheet

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Product Review: James F. Koopmann Pine Horse, Inc. Quest Software s Foglight Performance Analysis for Oracle

Big Data for the Rest of Us Technical White Paper

Kaseya Traverse. Kaseya Product Brief. Predictive SLA Management and Monitoring. Kaseya Traverse. Service Containers and Views

Complex, true real-time analytics on massive, changing datasets.

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

A Sumo Logic White Paper. Harnessing Continuous Intelligence to Enable the Modern DevOps Team

Big Data at Cloud Scale

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management

The 3 questions to ask yourself about BIG DATA

Streaming Big Data Performance Benchmark for Real-time Log Analytics in an Industry Environment

Streaming Big Data Performance Benchmark. for

Red Hat Network: Monitoring Module Overview

Solution Overview. Optimizing Customer Care Processes Using Operational Intelligence

Best Practices for Monitoring: Reduce Outages and Downtime. Develop an effective monitoring strategy with the right metrics, processes and alerts.

The Purview Solution Integration With Splunk

Aternity Virtual Desktop Monitoring. Complete Visibility Ensures Successful VDI Outcomes

Social Business Intelligence For Retail Industry

The Complete Performance Solution for Microsoft SQL Server

Redefining Infrastructure Management for Today s Application Economy

Delivering Customer Value Faster With Big Data Analytics

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

Your Data, Any Place, Any Time.

Desktop Activity Intelligence

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

Kaseya White Paper Proactive Service Level Monitoring: A Must Have for Advanced MSPs

Information Technology Policy

XpoLog Competitive Comparison Sheet

Aternity Desktop and Application Virtualization Monitoring. Complete Visibility Ensures Successful Outcomes

Logentries Insights: The State of Log Management & Analytics for AWS

Managed Services Technology Stack

Extending Network Visibility by Leveraging NetFlow and sflow Technologies

Manufacturing Analytics: Uncovering Secrets on Your Factory Floor

The top 10 misconceptions about performance and availability monitoring

WHITE PAPER OCTOBER Unified Monitoring. A Business Perspective

SQLstream 4 Product Brief. CHANGING THE ECONOMICS OF BIG DATA SQLstream 4.0 product brief

Automating Healthcare Claim Processing

CitusDB Architecture for Real-Time Big Data

Engage your customers

Business Intelligence and Big Data Analytics: Speeding the Cycle from Insights to Action Four Steps to More Profitable Customer Engagement

SAP SE - Legal Requirements and Requirements

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

!!!!! BIG DATA IN A DAY!

LOG AND EVENT MANAGEMENT FOR SECURITY AND COMPLIANCE

PLUMgrid Toolbox: Tools to Install, Operate and Monitor Your Virtual Network Infrastructure

HP SiteScope software

The Power of Risk, Compliance & Security Management in SAP S/4HANA

Why Big Data in the Cloud?

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

Actionable insight for IT BIG Data - HP Operations Analytics August 22, 2013

A business intelligence agenda for midsize organizations: Six strategies for success

Enterprise IT is complex. Today, IT infrastructure spans the physical, the virtual and applications, and crosses public, private and hybrid clouds.

CA APM 9.5 Application Performance Management for Cloud Introduction

Table of Contents Cicero, Inc. All rights protected and reserved.

VMware vcenter Log Insight Delivers Immediate Value to IT Operations. The Value of VMware vcenter Log Insight : The Customer Perspective

End-to-end Service Level Monitoring with Synthetic Transactions

Wonderware SmartGlance

Vistara Lifecycle Management

THE 2014 THREAT DETECTION CHECKLIST. Six ways to tell a criminal from a customer.

Minder. simplifying IT. All-in-one solution to monitor Network, Server, Application & Log Data

Paper Robert Bonham, Gregory A. Smith, SAS Institute Inc., Cary NC

Management of VMware ESXi. on HP ProLiant Servers

Adobe Insight, powered by Omniture

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

CA Service Desk Manager

IBM Tivoli Composite Application Manager for WebSphere

Big Data & Cloud Computing. Faysal Shaarani

Oracle Enterprise Manager 13c Cloud Control

Virtual Operational Data Store (VODS) A Syncordant White Paper

Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

How to Define SIEM Strategy, Management and Success in the Enterprise

PUSH INTELLIGENCE. Bridging the Last Mile to Business Intelligence & Big Data Copyright Metric Insights, Inc.

STEELCENTRAL APPINTERNALS

LOG INTELLIGENCE FOR SECURITY AND COMPLIANCE

Transcription:

Machine Data Analytics with Sumo Logic A Sumo Logic White Paper

Introduction Today, organizations generate more data in ten minutes than they did during the entire year in 2003. This exponential growth of machine data provides a new set of challenges for IT and Operations teams. Data originates from all areas of an infrastructure: custom applications, network infrastructure, servers and sensors of all types even social media applications. The rapid growth of consumers coming online, the interactions between humans and applications, and the interactions across complex distributed systems and applications themselves are just a few of the factors driving this enormous data growth. Machine data is predominantly made up of log files, and contains critical information about how the systems work in isolation and in conjunction with one another. Machine data is predominantly made up of log files, and contains critical information about how the systems work in isolation and in conjunction with one another. Importantly, this machine data is fundamentally needed to troubleshoot and monitor your entire infrastructure. At an IT level these logs provide critical information to identify, for example: + Which server processes are consuming the most resources + The latency between network segments + What exceptions have occurred in the last five minutes The information contained within these logs can also help shed light on a variety of critical business-related topics, for example: + How customers behave on revenue-generating applications + The likelihood of customer churn + Which application features are most popular and drive customer engagement +...and much more

The Two Major Challenges With Analyzing Machine Data The difficult unknown unknowns problem is where machine learning comes in, as only modern machine data science, with automated, algorithm-driven predictive analytics, can reach beyond human limitations to extract insights from massive volumes of Big Data. Most people have heard of the 3 V s of Big Data: velocity, variety and volume. These characteristics certainly apply to unstructured or semi-structured log files, which are generated across multiple and diverse environments: data centers, public and private clouds, and virtualized infrastructures. Given that a single transaction can touch multiple, and often over a dozen, unique infrastructure components across these diverse environments, the ability to troubleshoot and monitor the impact of these transactions across different systems becomes virtually impossible. The challenge with the rapid volume growth of machine data is that the amount of noise in the system can completely overwhelm the relevant signals that would provide insights about the organization s infrastructure and applications (Figure below). Machine Data Today Machine Data Tomorrow The second challenge that organizations face revolves around the human limitation in knowing what to ask of the data. There are two core types of analytics around machine data and most organizations fail to distinguish the relevance of each: + Analyzing and answering questions you know to ask about your infrastructure. This is the known unknowns problem, the resolution to which involves functions like iterative search, alerting, reporting and dashboard visualization. + Gaining insights even when you don t know what questions to ask the unknown unknowns problem. Fundamentally people can t glean insights from machine data when they don t know where to look or what to look for. The difficult unknown unknowns problem is where machine learning comes in, as only modern machine data science, with automated, algorithm-driven predictive analytics, can reach beyond human limitations to extract insights from massive volumes of Big Data.

The Sumo Logic Approach Our approach towards analytics around machine data tackles both problems: how to address the sheer quantity of data and how to support both styles of analytics. Meeting Big Data Volume Requirements Universal Collection Sumo Logic enables enterprises to collect and analyze machine data from virtually any source regardless of volume, format, or location. This includes servers, virtualization infrastructure, network devices, security infrastructure, custom and 3rd-party applications, databases, RFID scanners and more. These sources can be located on-premise, in the cloud, and in virtual environments, and can generate data volumes well into the terabytes per day. Given today s distributed systems and environments, the ability to collect and centralize all machine data across disparate distributed systems and applications is paramount. Secure and Reliable Collection Sumo Logic is designed from the ground-up to securely and reliably collect data from any enterprise environment, including those with Big Data scale requirements. Data is securely and reliably collected through either local collection (via Sumo Logic Collectors) or through hosted collection (via https or directly from Amazon S3). The Sumo Logic Collector is a small footprint software application that can be deployed locally or remotely from the host data source. Sumo Logic Collectors compress data 10x, encrypt all data before transmitting to the Sumo Logic service, and cache all data to ensure data is never lost due to network issues. All data is collected in raw, or unstructured format with no need to parse or understand the data upfront; all data processing and parsing is handled in the cloud. By separating collection from processing and parsing, which occur entirely in the Sumo Logic service, there is no need to update complex parsing logic on every Collector. Consequently Collector performance is significantly improved and management overhead significantly reduced. Data can also be sent to the Sumo Logic service via hosted collection. Through hosted collection, customers send data directly from the data source to Sumo Logic, without adding any footprint to their IT infrastructure. Hosted collection can be deployed for on-premise environments, SaaS/IaaS/PaaS environments, and for direct collection from an S3 bucket in Amazon. Log Aggregation and Centralization The Sumo Logic service offers a single repository for all machine data. Given today s distributed systems and environments, the ability to collect and centralize all machine data across disparate distributed systems and applications

is paramount. As a single transaction can traverse multiple environments, having all data in a single location is critical for customers to effectively troubleshoot errors and efficiently determine root cause of any failures. Globally Distributed Cloud Architecture Sumo Logic fundamentally believes that only an architecture that elastically and seamlessly scales can address the performance and scalability issues that companies with home-grown or on-premise commercial solutions face. To solve those challenges, our patent-pending Elastic Log Processing engine scales every component of the Sumo Logic service individually, based on demand, CPU, or I/O requirements. As a result, our log management and analytics service is capable of scaling to multiple terabyte/day Big Data volumes, and furthermore supports bursting requirements for any company whose application or infrastructure load can vary dramatically over the course of a year, month, or even day. And, because we are 100% cloud-based, companies no longer need to provision extra hardware, software, storage, let alone human assets, to support this effort. After identifying the occurrence of a precise condition, such as a specific number of instances of a particular exception or an average response time in excess of an acceptable value, Sumo Logic provides immediate notification to customers to enable investigation and issue resolution. The Two Types of Analytics Underpinning the analytics service that we provide is the Sumo Logic streaming query engine. This engine provides companies with the ability to constantly get real-time updates around their entire infrastructure, whether via visualization through dashboards, ad hoc queries, or scheduled queries and reports. Known Unknowns : Search, Alerts, and Dashboards Setting up search queries whose results find their way into reports and dashboards requires the user to know what they re searching for, be it an error condition, specific application attributes, or network performance. Using Sumo Logic s powerful query language, customers can iteratively slice and dice data, and express all the questions they need to ask. All searches are done with Sumo Logic s search engine-like syntax, incorporating keywords, wildcards, and Boolean logic. Queries can be evaluated in an incremental fashion, while intermediate results are pushed immediately to the web-based UI. Data fields can be parsed on-the-fly for inclusion in further analysis, including statistical analysis enabled by Sumo Logic s full support of mathematical libraries. Sumo Logic s extensive query options mean that questions can be posed and answered quickly. Instead of following hunches and making educated guesses, IT teams can quickly scour massive amounts of data in search of the anomalies, error reports or patterns that will pinpoint the source of the problem. The Sumo Logic log management and analytics service also enables early warning through threshold-based alerts. After identifying the occurrence of a precise condition, such as a specific number of instances of a particular

exception or an average response time in excess of an acceptable value, Sumo Logic provides immediate notification to customers to enable investigation and issue resolution. Alerts can be triggered either when a threshold is met or not met, i.e. when an event that shouldn t occur does, or when an event that should occur doesn t. LogReduce takes millions of lines of log results and distills them into a discernable set of underlying patterns, all without users ever writing a specific query. In other words, LogReduce reduces the prevalence of unknown unknowns and turns them into known knowns. To visualize data and monitor complex environments in real-time, Sumo Logic enterprise dashboards take data fresh off the wire and display it with near-zero latency. Our Elastic Log Processing Engine powers our dashboards, rendering them capable of processing terabytes of data and delivering immediate results. These dashboards provide a variety of visualization options including line, bar, column, and table charts. Dashboards can include overlay data from multiple sources to enable visual correlation and highlighting of relationships between operational metrics and business results. Unknown Unknowns : LogReduce and Anomaly Detection Today s homegrown and on-premise commercial log management solutions only provide responses to questions that humans know to ask ( known unknowns ). To solve this challenge, Sumo Logic developed our patentpending LogReduce technology that leverages modern machine learning to proactively identify patterns and insights for the many situations when users don t know what questions to ask. LogReduce takes millions of lines of log results and distills them into a discernable set of underlying patterns, all without users ever writing a specific query. In other words, LogReduce reduces the prevalence of unknown unknowns and turns them into known knowns. Furthermore, LogReduce is capable of learning and improving over time, by offering users the ability to refine and personalize results. With refinement, users have the ability to improve the automatically extracted signatures by splitting overly generalized patterns into finer-grained signatures, or editing overly specific signatures to mark fields as wild cards. These modifications will then be remembered by the Sumo Logic system. With personalization, LogReduce helps users uncover the insights most important to them by capturing user feedback and using it to shape the ranking of the returned results. Users can promote or demote signatures to ensure that they do (or do not) appear at the top of LogReduce results. Besides obeying this explicit feedback, Sumo Logic also uses this information to compute a relevance score, which is used to rank signatures according to their content. These relevance profiles are individually tailored to each Sumo Logic user to improve the customer experience.

Sumo Logic focuses on combining the best of human-based interactions (searches, alerts, dashboards, etc.) with machine learning (LogReduce, Anomaly Detection) to provide insights from machine data and enable enterprises to satisfy both business and operational requirements. Sumo Logic Anomaly Detection further extends LogReduce by enabling users to automatically detect changes to baseline patterns and system behaviors. Anomaly Detection automates the process of detecting anomalies and notifies users of deviations from the norm. Anomaly Detection first analyzes all the log data collected, looking for anomalies and unfamiliar patterns. From that data, it makes a summary list of the most compelling and business-critical events and presents those findings to our users, who can select the items of greatest interest and drill down to investigate further. Conclusion Sumo Logic focuses on combining the best of human-based interactions (searches, alerts, dashboards, etc.) with machine learning (LogReduce, Anomaly Detection) to provide insights from machine data and enable enterprises to satisfy both business and operational requirements. As a result, Sumo Logic enables enterprises to successfully extract signal from growing amounts of machine data, rather than being inundated by noise (Figure below). Integrating these capabilities into a single cloud-based service provides not just the benefits of the analytics but also lower TCO and faster time-to-value. Machine Data Today Machine Data Tomorrow Without Sumo Logic