The Road to Technical Monitoring with SAP Solution Manager Heiko Zuerker ALM230 Copyright 2012 Rockwell Automation, Inc. All rights reserved.
Agenda Rockwell Automation s SAP and Solution Manager Landscape Run SAP Like a Factory and Operations Control Center Initial Scope for Technical Monitoring From CCMS to Technical Monitoring Problems and Challenges Next Steps Copyright 2012 Rockwell Automation, Inc. All rights reserved. 2
Rockwell Automation Manufacturer of industrial automation control and information solutions World headquarters in Milwaukee, WI Annual sales about $6 billion Serving customers in more than 80 countries Over 400 locations world wide About 21,000 employees About 36,000 SAP users (internal & external) Copyright 2012 Rockwell Automation, Inc. All rights reserved. 3
SAP Landscape 8 SAP Environments Dual Track Development Break / Fix Next Release Single Global SAP Instance Production 16 SAP Systems (ECC, CRM, BI, Portal, PI, etc.) 58 Dialog Instances on 34 servers (8 CPU cores / 192 GB RAM) 25 TB of storage at database layer Copyright 2012 Rockwell Automation, Inc. All rights reserved. 4
Solution Manager Landscape Sandbox Solution Manager Production Solution Manager All environments except Sandbox (220 SAP Instances connected) Technical Monitoring Business Process Monitoring Business Process Analytics CCMS Root Cause Analysis CTS+ Central Performance History Data Consistency Management Job Scheduling Management Copyright 2012 Rockwell Automation, Inc. All rights reserved. 5
Solution Manager Production Solution Manager 7.1 SP4 Database 1.2 TB with 120 GB growth per month 2 Application Servers Java 4 server nodes each 4 GB heap for each server node ABAP 60 dialog work processes 30 background work processes 16 GB extended memory / 12 GB heap Total of 4 application servers needed, according to sizing forecast Copyright 2012 Rockwell Automation, Inc. All rights reserved. 6
What is Run SAP Like a Factory (RSLF)? Run SAP Like a Factory is the SAP approach to help customers to implement and continuously improve the operations of an IT Business Solution. The objective is to operate with minimal costs and effort. Practical and effective delivery based on short project cycles Achieved by simplification and automation of operation procedures or optimization of the system landscape and implemented business processes. Run SAP Like Factory provides methodology, content, and tools combined with premium service and trainings. Source: SAP Copyright 2012 Rockwell Automation, Inc. All rights reserved. 7
What is an Operations Control Center (OCC)? An Operations Control Center (OCC) consists of: A set of central monitors, which permanently report the status of the business processes and related IT landscapes, including important business and technical exceptions. In alignment with SAP s support standards, the monitors are part of: Application Operations Business Process Operations An infrastructure, which pro-actively monitors the solution 24x7 without manual effort, and which triggers and correlates alerts in case of problems. The alerts are bundled in an alert inbox. A small team of operators, who only work on the alerts in a standardized way: Either perform pre-configured simple analysis procedures (Guided Procedures). Or convert the alert into an incident / service request for processing by the next level support. Source: SAP Copyright 2012 Rockwell Automation, Inc. All rights reserved. 8
Why are we implementing RSLF and OCC? Reduction of operational costs Improve stability of SAP environment Data inconsistencies between SAP systems Large volume of technical errors Technical errors are not associated with business transactions Incomplete or lack of actionable monitoring High level of manual monitoring Copyright 2012 Rockwell Automation, Inc. All rights reserved. 9
How will we accomplish this with RSLF and OCC? Centralized monitoring based on SAP Solution Manager functionality Pro-active monitoring Automated monitoring Holistic monitoring covering technical and functional areas Streamlined operational processes Reduction of Noise Alert only if it is actionable Resolve reoccurring issues Problem Management Remove highly skilled resources from simple tasks Copyright 2012 Rockwell Automation, Inc. All rights reserved. 10
Monitoring versus Alerting Monitoring Collection of metrics for reporting and dashboards Source for alerts Not actionable Alerting Notification of a problem Multiple metrics can be in one alert Actionable Copyright 2012 Rockwell Automation, Inc. All rights reserved. 11
Initial Scope SAP system generated alerts for BI, CRM, ECC, and Solution Manager Replacement of Daily Health Check Manual verification of SAP system status and health Executed daily for every production system Up to 22 tasks for each system Duration 1-2 hours per system Replacement of CCMS for BI, CRM, ECC Content server MaxDB databases End User Experience Monitoring Copyright 2012 Rockwell Automation, Inc. All rights reserved. 12
Out of scope Infrastructure Hosts Network Disk Oracle Databases PI Monitoring Requires PI 7.3 Connection Monitoring Currently does not meet our monitoring requirements Only supports R/3 & HTTP type RFC connections from SM59 Does not support TCP/IP connections BI Monitoring Enterprise monitoring and incident management tool used for infrastructure. Currently does not meet our monitoring requirements Need to know immediately as soon as one step fails Don t want to maintain separate monitoring objects for each step Only one source for alerts. Metrics collected for reporting purposes only in Solution Manager, no alerting enabled. Copyright 2012 Rockwell Automation, Inc. All rights reserved. 13
Timeline Feb 2012 March 2012 April 2012 May-August September 2012 RSLF/OCC Workshop Review current CCMS configuration Upgrade of Solution Manager to 7.1 SP4 RSLF roadmap and planning sessions Upgrade of SMD Agents in all environments Configuration of monitoring Continuous tuning of thresholds Configuration of basic dashboards Parallel Monitoring Retirement of most tasks from Daily Health Check for BI, CRM, and ECC Writing of Guided Procedures for OCC Copyright 2012 Rockwell Automation, Inc. All rights reserved. 14
Resources Planning sessions 3 resources + SAP Consultant Configuration & Maintenance 1.5 resources for End User Experience Monitoring 1.5 resources for Technical Monitoring 1 Lead Monitoring 2 resources per shift (planned) Copyright 2012 Rockwell Automation, Inc. All rights reserved. 15
From CCMS to Technical Monitoring Reviewed existing CCMS thresholds Table ALGRPCUSPF Reviewed existing alerting configuration Only alert on actionable items Table CSMTOOLASG Compared required alerts with availability of alerts in Technical Monitoring Reviewed Daily Health Check and created list of missing alerts Created monitoring templates Created custom metrics and alerts to fill gaps Create Guided Procedures for each alert Copyright 2012 Rockwell Automation, Inc. All rights reserved. 16
Monitoring Templates Created generic production template and assigned to in-scope systems Disabled alerts for maintenance and planned downtime Performed daily review of alerts Adjusted thresholds as needed Notified Basis team if problems were discovered Created copy of generic template for exception cases Copyright 2012 Rockwell Automation, Inc. All rights reserved. 17
Templates Structure at Rockwell Automation Technical Instance ABAP Technical System ABAP Generic Generic BI BI CRM Batch, Middleware, User CRM ECC Batch, User ECC Non-Prod Non-Prod Technical Instance Java Infrastructure Generic Non-Prod Technical System Java Generic Non-Prod MaxDB Linux Windows Copyright 2012 Rockwell Automation, Inc. All rights reserved. 18
Guided Procedures Each alert has to have associated guided procedure Target audience is junior level Created list of all active Technical Monitoring alerts Assigned writing of guided procedures to members of Basis team Guided procedures are reviewed before being published Added custom alert description with hyperlink to guided procedure Copyright 2012 Rockwell Automation, Inc. All rights reserved. 19
Alert Inbox Currently monitored by Basis team Alert gets confirmed once resolved or escalated to incident Incident only gets created if the problem can t be resolved right away Incident contains all information from alert Integration into SAP service desk currently not used Separate project planned to integrate the Solution Manager Service Desk into an existing incident management system Create Notification used to escalate alerts into incidents Copyright 2012 Rockwell Automation, Inc. All rights reserved. 20
Alert Inbox Screenshots Copyright 2012 Rockwell Automation, Inc. All rights reserved. 21
System Monitoring Status overview of monitored systems Access to alert inbox Access to metrics Copyright 2012 Rockwell Automation, Inc. All rights reserved. 22
Status Propagation Copyright 2012 Rockwell Automation, Inc. All rights reserved. 23
Management Dashboards Integrated into Enterprise Portal navigation Basic management dashboards for system availability and performance Last 24 hours Last 30 days Last month Personal dashboard with alert statistics Copyright 2012 Rockwell Automation, Inc. All rights reserved. 24
Challenges No training available at time of project start E2E120: Technical Monitoring in SAP Solution Manager 7.1 Limited resources available with Technical Monitoring experience Underestimated Solution Manager 7.1 post-upgrade effort SMD Agents needed to be upgraded SAP Host Agent is now required Managed System Setup had to be completely re-done Copyright 2012 Rockwell Automation, Inc. All rights reserved. 25
Technical Problems During up-time portion of the upgrade, locking of the ABAP workbench locked the ability to check-in or out documents 50+ OSS Messages opened 250+ SAP Notes applied Damaged SLD Content prevented monitoring of hosts SLD extremely important with Solution Manager 7.1 Content issues can cause major problems Note 1093168 - Repair of SLD CR content Gaps in monitoring data Implemented all Notes related to gaps in monitoring data Increased the # of connections for each RFC destination Solution Manager Administration Infrastructure Extractor Framework Configuration Change Resource Cap to at least 2 (higher for Systems with many Instances) Timeout issues with SMD Agents in remote locations (EEM robots) smd.asio.remote.time.out.ms in Agent Admin - Advanced Settings Set a value higher than 10,000 Copyright 2012 Rockwell Automation, Inc. All rights reserved. 26
Early wins Discovered network performance issue High number of full garbage collections in BI Not enough ICM threads on CRM Login delays on External Portal Copyright 2012 Rockwell Automation, Inc. All rights reserved. 27
What is planned for the future? Inclusion of all production SAP systems Transfer monitoring of alert inbox from Basis team to OCC Retirement of custom SAP monitoring tool Custom dashboards for specific KPIs Patch Solution Manager to SP6 Implement Agent-on-the-fly for all environments Copyright 2012 Rockwell Automation, Inc. All rights reserved. 28
Final words Was it worth it? Definitely! Expect significant growth of your Solution Manager system Start small, limit the scope Minimize the number of monitoring templates Don t alert if it s not actionable Monitoring is important, give it the proper priority and resources Copyright 2012 Rockwell Automation, Inc. All rights reserved. 29
Please complete the session evaluation ALM230 Copyright 2012 Rockwell Automation, Inc. All rights reserved.