Beyond Logging and Monitoring New Techniques for Solving J2EE Application Problems in Production Environments David Kadouch BMC Software
Abstract Application downtime costs in production systems can range from $100,000 per hour to millions per hour. For most organizations, problem resolution remains a manual, errorprone, communications-intensive activity that involves multiple levels of IT staff, which proves to be a lengthy and costly process. Current solutions for solving problems in production include logging and monitoring of application servers but they often fall short of providing valuable information for solving critical problems. This presentation will demonstrate how application problem resolution (APR) technology can be used to accelerate problem resolution in production systems and reduce IT costs significantly. By automating the problem resolution process you can: - Eliminate the need to recreate problems and the environment in which they occur - Reduce IT support costs - Eliminate negative business impact - Pinpoint performance bottlenecks - Identify configuration problems and functional errors
Agenda Application Problem Resolution (APR) Challenges Traditional tools and techniques AppSight APR System Combining Monitoring, Logging and APR Live demo of AppSight APR 4 scenarios Conclusions
Problem Resolution is a Life Cycle Challenge Constituents PREPRODUCTION PRODUCTION Constituents Users/ Customers Test/QA Support Development Development * *NIST May 2002: Economic Impacts of Inadequate Infrastructure for Software Testing
Application Problem Resolution process Problem Resolution Process 80% Root Cause Analysis 20% Fix Manual & Iterative Root Cause Analysis Process
APR Process Issues: Gather Phase Problem Resolution Process 80% Root Cause Analysis 20% Fix Numerous Sources/Silos Manual process to retrieve Access issues to production environment Iterative process if initial guess is wrong Time consuming task
APR Process Issues: Recreate Phase Problem Resolution Process 80% Root Cause Analysis 20% Fix Must exactly recreate SAME environment Hardware, Network, Software, Database, Must exactly reproduce SAME problem Symptom does not reflect the problem root cause
APR Process Issues: Analyze Phase Problem Resolution Process 80% Root Cause Analysis 20% Fix Trial and Error Process Manual Analysis Process Collaboration Issues
Traditional Techniques: Logging Ubiquitous Almost every system uses text-based log files Logging is already in place and therefore should be leveraged Not flexible Implemented during coding cannot be modified at runtime Painful, time consuming analysis Unstructured format - flat, not organized Not easily searchable Partial information No configuration info, no perf metrics Does not include context Log files
Traditional Techniques: Monitoring Great for detecting problems Can detect performance problems and resource consumption Can notify operators by sending alerts Can proactively identify trends that can lead to problems Not very useful for solving problems Provide only statistical data Does not provide information on single transactions Cannot drill-down into execution code Designed for IT operators Very little value for developers
AppSight Problem Resolution System
AppSight Application Problem Resolution System Problem Definition & Monitoring Problem Detection, Capture & Alerting Assessment & Triage Next generation monitoring built on top of unique problem resolution architecture Cross platform J2EE,.NET & Windows Client & server side of app Works across the application lifecycle Production Pre-production Operations Support Development QA Development Role-based views for transaction & problem analysis
AppSight Complete Solution Mouse, Keyboard Screen activity Transactions, Metrics J2EE components Actual Code Execution Methods, Variables User System Code Synchronized No need to recreate the environment
Solve Various Types of Problems End-User Functional Configuration Performance AppSight Log
Support Distributed Environments J2EE application Servers Web Servers Windows Clients
APR Leverages Existing Investments Production system Blackbox 3) Record problem 4) Include standard logging in BlackBox log AppSight Consoles Log files AppSight Repository Monitoring system 1) Identify problem 2) Start BlackBox recording 5) Analyze problem 6) No need to recreate problem / environment
Live Demos
AppSight Sample Deployment Environment Database Mainframe J2EE Black Boxes Legacy UNIX apps J2EE EJB tier: Weblogic, Websphere, JBoss J2EE Web tier: Weblogic, Websphere, JBoss Legacy Win apps IIS Web Servers Other Web Servers (non-windows) QA/Remote Browser/Thick Client Windows Black Boxes
Demo 1 Supporting users in production
Production Support - Reporting a problem AppSight Repository Recording only colored transactions Production servers Remedy WebSupport
Production Support - Problem Analysis Black Box Log attachment Support engineer
Demo 2 Capturing Performance Problems in Production
Production Monitoring - Performance Problem AppSight Repository Production servers Recording with light profile Remedy Alert on slow response time
Production Monitoring - Performance Problem AppSight Repository Production servers Recording with deep profile Capture the application execution flow Remedy
Production Monitoring - Problem Analysis Support engineer
Conclusions Problem Resolution is a lifecycle challenge Cost of problem increases dramatically the later it is found in lifecycle Impact of problem can cause projects to fail or be delayed significantly Re-architecting Problem Resolution process benefits Improved productivity of development/qa processes Improved effectiveness of production support processes Removal of non-reproducible problems Development spends less time solving problems Support improves their ability to resolve problems before escalating