The Total Economic Impact of Automating Systems Management Travis Greene Chief Service Management Strategist
Agenda The Challenges of Non-Automated Systems Management Steps to Apply Automation Effectively Examples of Systems Management Automation The Total Economic Impact Results
The Challenges of Non-Automated Systems Management
Systems Management Challenge #1 Reduce management and administration costs to meet budget realties and shift resources to new and innovative services $ Billion USD $200 $150 Power & Cooling Management & Administration Hardware Expenditures Servers Installed Millions 50 40 30 $100 20 $50 10 $0 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 0 Source: IDC, March 2008
Systems Management Challenge #2 Balance customer demand for quality with cost Customer Demands Rapid Response Service Levels Met Consistent Delivery Workload Backlog Human Error New Employees Custom Custom Services Services Standard Standard Services Services Consistent Delivery New Employees Service Levels Met Human Error Rapid Response Workload Backlog Costs IT Organization Challenges
Steps to Apply Automation Effectively
Step 1: Reduce Inefficiencies With current management tools using automation Each group has tools, often with feature overlap Service Desk A small percent of features are actually used Network Management Security Management Database Management Application Management IT Functions ITPA, through adapters, can leverage more features, reducing: Human error Manual labor While improving: Knowledge sharing Consistency Best of Breed Management Tools Managed Technologies
Step 2: Integrate Tools Across domains and disciplines Begin with integration within operations and security domains separately Security User Monitoring Delegated Authority Change Control SIEM Config Assessment Response Time Monitoring Service Level Reporting Operations Performance & Availability Problem Escalation Event Prioritization Ticketing Ticketing Security Change Control User Monitoring SIEM Operations Performance & Availability Problem Response Time Escalation Monitoring As maturity increases, integrate across domains Delegated Authority Config Assessment Service Level Reporting Event Prioritization Ticketing Ticketing
Step 3: Integrate with the Business Involve them with IT Management Processes Gain just-in-time approvals from stakeholders and satisfy self-service requests Provision virtual machines Approve policy exceptions New user provisioning Provide automated reporting and charge-back ROI, SLA, Process Improvement Charge-back for resource usage (e.g. virtual machines) or process execution (e.g. change mgmt) ITPA Systems Management Ticketing Helpdesk ROI, SLA Reports Billing System Management LOB Requestor or Stakeholder Other Sources (RFCs, CMDB, Change monitoring, etc.)
Examples of System Management Automation
Perform Routine Maintenance Such as Rebooting Servers 1. NetIQ Aegis initiates the server reboot process based on a schedule and suppresses reboot related events 2. NetIQ Aegis commands the load balancer to block new sessions to the first server Saved: 1 minutes 3. NetIQ Aegis commands NetIQ AppManager to monitor for the server to reach zero active sessions Saved: 15 minutes 4. NetIQ Aegis commands NetIQ AppManager to reboot the server and wait for completion Saved: 15 minutes 5. NetIQ Aegis commands NetIQ AppManager to validate server health Saved: 3 minutes 6. NetIQ Aegis commands the load balancer to enable new sessions Saved: 5 minutes Administrator NetIQ AppManager ResponseTime 7 8 3 NetIQ AppManager NetIQ Aegis 1 2 4 9 6 5 Load Balancer 7. NetIQ Aegis commands NetIQ AppManager to verify service performance Saved: 1 minute 8. NetIQ Aegis sends a progress notification email to the administrator Saved: 1 minute 9. NetIQ Aegis repeats steps 2-8 for each additional server in the group Saved: 10x minutes Active Sessions Web Servers Total Time Saved: 410 Minutes
Recover from Common Events 1. Available disk space falls below threshold Such as Low Disk Space Conditions NetIQ Aegis 4 8 6 Administrator 2. NetIQ AppManager generates an event, triggering a process in NetIQ Aegis 3. NetIQ Aegis requests disk usage analysis from NetIQ AppManager Saved: 15 minutes 4. NetIQ Aegis sends email to admin requesting approval to clean up Saved: 5 minutes 5. If no response is received within a defined time NetIQ Aegis escalates to a higher level of management Saved: 5 minutes 6. Administrator approves partial cleanup through NetIQ Aegis Saved: 4 minutes 7. NetIQ Aegis commands NetIQ AppManager to perform cleanup Saved: 15 minutes 8. NetIQ Aegis sends confirmation email to the administrator Saved: 4 minutes NetIQ AppManager Agent 2 3 5 NetIQ AppManager 1 File Type Delete? Archive? *.dmp *.log Management Archive 7 Trash Total Time Saved: 48 Minutes
Update the CMDB With Reconciled CIs from Multiple Management Tools NetIQ Aegis 1. A new NetIQ Aegis adapter is implemented, providing connectivity to a monitoring tool such as NetIQ Secure Configuration Manager 2. NetIQ Aegis reconciles the configuration information from the new tool with what is known from other tools by synchronizing computers and groups using NetIQ IQRM Saved: 60 minutes 3. NetIQ Aegis updates the CMDB using a specific adapter or the NetIQ Aegis Adapter for Databases Saved: 30 minutes 4. NetIQ Aegis continues to reconcile new configuration information as it is received via multiple adapters Saved: 5 minutes 5. NetIQ Aegis continues to update the CMDB on schedule Saved: 15 minutes 6. If a conflict is found between the CMDB and the configuration information that NetIQ Aegis has, an event is raised requesting manual reconciliation NetIQ Secure Configuration Manager NetIQ AppManager 1 Admin NetIQ Security Manager 2 4 BMC Remedy 5 3 CMDB Total Time Saved: 110 Minutes
Identify Change-Induced Incidents For Faster Service Restoration 1. An end user raises an incident with the help desk, describing unavailability of a service and the help desk logs a ticket NetIQ Aegis 2. NetIQ Aegis collects information from change monitoring tools such as NetIQ Change Guardian, or simply collects all recent changes Saved: 30 minutes 3. NetIQ Aegis populates the ticket with information from change monitoring Saved: 10 minutes 4. NetIQ Aegis monitors the change monitoring tools for additional information and updates the ticket Saved: 10 minutes 5. If the Help Desk can not resolve with the information provided, administrators are contacted to resolve the incident 6. NetIQ Aegis monitors the ticket for a resolution code and looks for unintended consequences of the resolution from other monitoring tools, updating the ticket as necessary Saved: 15 minutes 7. NetIQ Aegis closes the ticket if no additional events are detected within a specified amount of time Saved: 1 minute Business Service User 3 4 7 1 Ticketing System Helpdesk 2 5 6 Administrators Other Sources (RFCs, CMDB, NetIQ Change Guardian, etc.) Total Time Saved: 66 Minutes
Run Business Jobs And Replace Costly Job Scheduling Tools 1. NetIQ Aegis initiates the Data Replication process based on a daily schedule Saved: 1 minutes 2. NetIQ Aegis transfers 3000 files from the customer download server to six loadbalanced application servers Saved: 60 minutes 3. NetIQ Aegis confirms successful transfer of all files after a designated time period based on file size and transfer rates Saved: 20 minutes 4. If there are any failures, NetIQ Aegis collects information and notifies an administrator via email and re-initiates the transfer after approval or after a designated amount of time Customer Download Server 2 Admin 5 4 7 1 NetIQ Aegis 3 6 5. NetIQ Aegis continues to retry the transfer and contact the admin a designated number of times Saved: 5 minutes 6. Once file transfer is completed NetIQ Aegis initiates the processing of data on each application server and waits for completion Saved: 5 minutes 7. NetIQ Aegis sends a completion email to the designated administrator or a failure email if not completed on time Saved: 4 minutes Application Servers Total Time Saved: 95 Minutes
Centralize Monitoring And Resolve Custom Business Application Events 1. Business application performance begins to degrade and the application writes an event to a MS SQL database! 4 NetIQ Aegis Admin 2. NetIQ Aegis detects the new row that has been added to the database using the Database Adapter and reads the details Saved: 1 minutes 3. NetIQ Aegis forwards the event into NetIQ AppManager, populates the event details with affected user names and event log info, reconciles the application name with the associated server name and object Saved: 10 minutes 4. NetIQ Aegis sends an email to the administrator with designated options for event recovery 5. Administrator replies to NetIQ Aegis, which commands NetIQ AppManager to resolve the known error using established procedures Saved: 20 minutes Business Service Resolution Option 1 Option 2 Option 3 Select 5 NetIQ AppManager 3 2 6 Database Server! 1 6. NetIQ Aegis updates the database, closes the event in NetIQ AppManager and emails the application administrator Saved: 15 minutes Total Time Saved: 46 Minutes
Prioritize and Resolve Events Based on the Impact to End Users 1. NetIQ AppManager detects multiple MS SQL database events, including high lock utilization, a high number of master DB locks and lock wait time high 2. NetIQ AppManager ResponseTime for Web detects degradation in performance for a business application 3. These events are correlated in NetIQ Aegis based on a business service that has been defined in the Resource Management Database Saved: 10 minutes 4. NetIQ Aegis closes the symptomatic events in NetIQ AppManager and opens a new reprioritized event that indicates high SQL lock utilization Saved: 10 minutes 5. NetIQ Aegis alerts the database administrator, describing the situation along with a recommendation for resolution in an email Saved: 5 minutes 6. The database administrator approves termination of the SQL PID associated with the user consuming the most locks via reply to the email to resolve Saved: 8 minutes 7. NetIQ Aegis commands NetIQ AppManager to terminate the SQL PID and replies to the administrator with the results Saved: 15 minutes! Business Service Web Server NetIQ Aegis 2 3 7 4 Administrator NetIQ AppManager!! Total Time Saved: 48 Minutes Application Server 5 1 6 Resolution Yes No Kill SPID! Database Server
The Total Economic Impact Results
FORRESTER * Determined using the Aegis ROI calculator developed by Forrester Consulting based on a representative customer with 1,000 servers. Required Optional
ROI Analysis Available Independently developed by an analyst firm