Top 10 Reasons to Automate your IT Run Books
DS12 Top 10 Reasons to Automate Your IT Run Books
Run Book Automation is an emerging technology space that is being adopted by many of the largest, most sophisticated IT organizations. While each firm has slightly different priorities for implementing automation within the data center, there has emerged a standard set of objectives for companies in aggregate. This paper lists and explains the top 10 reasons why CIOs, IT managers, and IT Production Support teams are placing Run Book Automation on the top of their project list. Data Center Management Trends and Tools Organizations today have more applications, more servers, rely on larger global networks, and store more data than they ever have before. The largest corporations today operate with over 1000 applications and many thousands of servers. Analysts such as Gartner and IDC also predict that this growth trend will only continue as businesses look to differentiate within an ever-expanding global economy by developing and deploying powerful business-specific algorithms and customizednet and J2EE applications. As data centers have grown and become more complex, IT organizations have relied on systems management tools to manage IT infrastructure more efficiently and with greater leverage of existing headcount. Since 2000, IT has continually been asked to do more with less, add value to the business, run IT like a business, and minimize headcount additions. IT has deployed many systems management tools to enable leveraged management of infrastructure the most common tools falling into 3 categories: systems monitoring, ticketing, and change/configuration management. The broader systems monitoring category is one of the largest software categories at $7B per year, and includes network, server, application, and database monitoring tools in addition to event console tools. These tools allow IT organizations to keep their fingers on the pulse of infrastructure and determine if it is performing well or falling short of business objectives. Top 10 Reasons to Automate your Run Books
Today s Challenge As deployments of systems management tools mature, and organization become more adept at utilizing systems monitoring and ticketing functionality, they seek new methods to increase leverage and organizational efficiency. Although monitoring solutions are adept at detecting failures in the infrastructure, and help desk solutions help to track tickets as incidents get routed and escalated, they offer little help for resolving these incidents. IT Operations are battling high volume of alerts, error-prone manual procedures, and the complexity of the supported infrastructure on a daily basis. There is a lack of knowledge transfer between level 3 experts and frontline operators, resulting in high escalation percentages and steep learning curves. Existing solutions have not been sufficient in alleviating these IT pain points, a new category of software solution is needed to help take IT Operations beyond monitoring and ticketing. What s Next? With the emergence of new Run Book Automation technologies, organizations are finding that they can be more responsive to alerts/incidents, increase infrastructure uptime, and reduce operational cost all at the same time. In fact, there are 10 specific areas where IT Professionals are unleashing value within their organizations through automation. Top 10 Value Creation Methods from Runbook Automation 1. Drastically reduce time to resolution for incidents/alerts. 2. Empower frontline IT to resolve more incidents. 3. Reduce escalations and minimize alert floods. 4. Enable a consistent, repeatable process for orchestrating change. 5. Create linkage between ITIL incident and problem management processes. 6. Capture full incident resolution audit trail; create process documentation. 7. Integrate role-based access control into incident resolution processes. Top 10 Reasons to Automate Your IT Run Books
8. Capture tribal knowledge in a usable/maintainable way for reduced training cost. 9. Automate repetitive maintenance procedures. 10. Integrate disparate systems management tools and processes. Drastically Reduce Time to Resolution for Incidents/Alerts Many organizations have rolled out advanced monitoring solutions to track performance for their growing IT infrastructure. With these tools in place, organizations are alerted quickly when an incident occurs with their applications or the supporting infrastructure. However, resolution of these incidents is still a largely manual process relying on tribal knowledge, escalations, and bridge calls resulting in slow and/or inconsistent resolution times. With a Run Book Automation solution, IT organizations can automate many of the manual diagnostic and resolution steps, accelerating their incident resolution process. Run Book Automation tools must enable the following rapid resolution functions: Self-heal automated workflows that can be operator initiated or automatically initiated from within monitoring tools Rapid diagnostic flows that can iterate across data center infrastructure faster than any manual process; this enables rapid troubleshooting and information capture for the front line and level 2/3 when an escalated action is required Automatic remediation including server and service restarts, even for complex clustered infrastructure operating with network load balancers Automatic ticket creation, update, and closure in addition to alert updates and closures upon ultimate resolution of the incident Top 10 Reasons to Automate your Run Books 5
With these features, Run Book Automation tools enable fast automated response to alerts and incidents with full documentation and information capture for diagnostics, remediation, and rootcause correction. Empower Frontline IT to Resolve More Incidents Incidents and alerts are handled first by frontline IT operators. Research has shown that in many organizations, up to 50% of all incidents are escalated to level 3 system and network administrators and management. Once these escalations occur, many IT Professionals are forced to spend hours on bridge lines and conference calls, even during off-hours and on weekends. Run Book Automation tools can help to transform this adhoc, manual, tribal knowledge driven process into streamlined work flows. Upon detection of an incident, Run Book Automation tools can execute automatic diagnostic and triage flows to better understand where the root cause of an incident may reside. With the easy-to-follow, wizard-based user interface, these automation flows can be executed directly by frontline operators, eliminating the need for level 3 experts and management to be involved in every troubleshooting process. After the diagnostic operations have been automatically executed, and the source of the incident has been found, the frontline IT staff can trigger remediation operations or escalate as necessary. Run Book Automation tools enable the front line to perform more of the diagnostic, triage, and remediation steps and, in many cases, completely solve problems utilizing corporate knowledge that used to only reside in the collective heads of level 3 administrators. With this capability, Run Book Automation enables IT organizations to operate as efficiently as possible when responding to incidents and alerts. 6 Top 10 Reasons to Automate Your IT Run Books
Reduce Escalations and Minimize Alert Floods As mentioned in section 2, Run Book Automation tools can capture the adhoc process and tribal knowledge that resides within the IT Operations organization. In many organizations, up to 50% of all incidents are escalated to level 3 system and network administrators after being initially handled by the front line. These escalations tend to be very costly and time-consuming for the broader IT organization. Even a small reduction in escalation volume can create significant bandwidth for IT experts to focus on more strategic and proactive issues. Because Run Book Automation tools capture the tribal knowledge utilized in the diagnostic, triage, and repair process by level 3 experts, they enable frontline IT to perform steps that previously required escalation. With Run Book Automation tools, frontline IT can automatically diagnose and remediate incidents more quickly, with far less escalation. Here s one example of an automated Incident Resolution Process using Run Book Automation: 1. Incident detected by proactive monitoring tools. 2. Run Book Automation repair flow initiated in self-heal mode or by operator in visually guided mode. 3. Frontline operations performs automated diagnostic actions across servers, applications, network, etc. 4. Likely incident causes are detected and remediation actions determined. 5. Frontline performs necessary triage and notification automatically in Run Book Automation tool 6. Frontline remediates incident using Run Book Automation tool with escalation to level 3 only where necessary. In summary, Run Book Automation tools enable the front line to perform more of the diagnostic and remediation steps and, in many cases, solve problems utilizing corporate knowledge that would otherwise only reside in the heads of level 3 administrators. Run Book Automation enables IT organizations to operate as efficiently as possible, with the fewest number of escalations when responding to incidents and alerts. Top 10 Reasons to Automate your Run Books 7
Enable a Consistent, Repeatable Process for Orchestrating Change Many large IT organizations are supporting hundreds to thousands of applications, servers and devices, and the number continues to grow at a rapid rate. While some organizations have welldocumented processes for rolling out changes to these environments in a consistent fashion, many organizations have undocumented or outdated processes and the IT Operations frontline finds themselves scrambling when mis-configurations result in production issues. Organizations that have implemented Run Book Automation tools have found that there are 3 primary advantages when automating the change and configuration management processes: 1. Enablement of IT staff to consistently execute on automated change management procedures. With change and configuration management procedures captured in a Run Book Automation tool, IT system administrators can initiate an automated set of procedures to check application and servers for compliance status and perform updates when necessary for servers, network devices, and storage devices. These change management workflows can be run across literally hundreds of servers and devices, with the full documentation of outcome captured for compliance requirements. 2. Improving overall cost efficiency of IT management staff. With a Run Book Automation tool deployed for orchestrating change and configuration management, previously siloed IT specialist teams can eliminate manual and error-prone procedures by automating the end-to-end change and configuration management process. 3. Drive increased agility and responsiveness. With detailed provisioning steps captured in automated run books, IT departments can now react more quickly to changing business needs by reducing the time to deploy new infrastructure. Organizations have consistently seen anywhere from a 50 to 70 percent reduction in time to provision new systems and infrastructure. 8 Top 10 Reasons to Automate Your IT Run Books
Create Linkage between ITIL Incident and Problem Management Processes As mentioned in section 4, many large organizations are responding to hundreds, if not thousands, of alerts a day. This level of alert management tends to push most IT Operations groups into a firefighting mode where they are reacting to incidents as fast as possible in an effort to maintain critical application and infrastructure uptime. In this firefighting mode, tickets are usually not updated with relevant troubleshooting information after the alert is cleared. Industry studies show that on average only 2% of incidents are tracked to closure with a ticket. Unfortunately, this situation leaves many IT Operations organizations managing symptoms (incident management) instead of addressing root cause to fix problems. When IT organizations implement a Run Book Automation tool, they can quickly begin to link their incident (symptom) management efforts to more effective problem management. Run Book Automation flows that enable rapid response to alerts through automated diagnostic, triage, and repair also fully capture the inputs and outputs to each automation activity. For example, an automation flow for a slow responding J2EE application will check application and server status, network status, load-balancer status, database status, etc. and record all of the inputs and outputs as part of the diagnostic and repair process. If necessary, the automation flow can restart specific servers and then recheck application performance. Over time, if this application is consistently slow and this specific automation flow is run routinely, data in the Run Book Automation tool is aggregated, allowing for detailed analysis around specific devices (e.g., insufficient memory in a server) or potential application issues (e.g., memory leaks) that need to be addressed. Leading Run Book Automation tools have features to capture and record all inputs and outputs as part of incident response and allow for aggregation and parsing of the data to look at performance and root-cause analysis by domain, applications, and CIs. Top 10 Reasons to Automate your Run Books 9
Capture Full Incident Resolution Audit Trail; Create Process Documentation Run Book Automation tools create value by automating alert/incident resolution steps that are typically performed manually, thus maximizing application uptime and enabling IT Professionals to focus on strategic IT issues. As mentioned in section 5, Run Book Automation tools also enable automatic information capture for each automation step, and the automation flow in aggregate. The information captured from each flow execution is automatically captured by the Run Book Automation tool and is stored in the tool s database. Additionally, Run Book Automation tools allow for automatic documentation creation from deployed automation flows once a flow is created, IT Professionals can select an option to create documentation which will create a full step-by-step document of what is performed in each step of the flow. This feature allows for IT Professionals to focus on rapid authoring of automations as opposed to time-consuming documentation of step-by-step procedures. With a Run Book Automation tool, the diagnostic, triage, and remediation process is fully documented from initial alert creation through resolution, including automatic ticket/alert update and closure. With this end-to-end process automated, users enjoy the additional benefits of having processes fully documented and the outcome of each process execution fully recorded two critical requirements for all Sarbanes-Oxley compliance audits. 10 Top 10 Reasons to Automate Your IT Run Books
Integrate Role-based Access Control into Incident Resolution Processes Because large IT organizations are constantly battling a high volume of alerts and escalations, many struggle with information hand-offs and efficient escalation between levels 1, 2, and 3. Often times, these information exchanges occur in the middle of the night, without tools to support capturing the current state of systems or where the diagnostic and triage process left off. Run Book Automation tools, when properly implemented, effectively solve this problem with two powerful features: #1 rolebased access control; and # the full capture of inputs and outputs to each automation step. Feature #2, the full capture of input and output data, has already been discussed in section 6. Feature #1, role-based access control, augments feature #2 to ensure that each IT Professional only performs automation steps for which they have authorization. In supporting role-based access control, Run Book Automation tools utilize gated transitions that require credentials that are appropriate for each automation step. These requirements can be mapped to an organization s Active Directory to ensure that proper credentials are met prior to execution. These two features together enable fewer escalations and more efficient escalations whereby automation flows escalate only when frontline operators don t have the proper permissions or where the automation flow requires expert operator intervention. Also, when an escalation occurs, it is far more efficient because level 2 and level 3 can easily determine which diagnostic and repair steps have been executed, what the output of each step has been, and what the likely set of next steps should be. Top 10 Reasons to Automate your Run Books 11
Capture Tribal Knowledge in a Usable/maintainable Way Turnover in IT organizations continues to challenge IT management in their efforts to serve the business in a repeatable and consistent way. Because many procedures are undocumented, unused, or out-of-date, IT Professionals tend to rely on tribal knowledge when responding to incidents. When these professionals leave the organization, they take this core tribal knowledge with them leaving the IT organization even less capable to deal with the volume of critical alerts. However, with the help of Run Book Automation, the situation does not need to be so dire. With a Run Book Automation tool, IT processes and procedures are captured as automation flows. Because these flows are constantly used to actually execute diagnostic and repair steps, they are maintained and kept current. With 2-way communication to the CMDB, changes in infrastructure can be initiated (and recorded) or can be discovered. This linkage enables Run Book Automation flows to automatically be kept current as changes in infrastructure occur. Because tribal knowledge is captured in a usable and easily maintained tool, it stays put when turnover exists. Run Book Automation flows can be run in an operator-initiated, visually guided mode. This visual guidance makes it easy for new frontline operators to learn the step-by-step diagnostic and repair processes as they respond to alerts. Run Book Automation tools with this visually guided mode of operation have enabled some IT Operations organizations to cut new hire ramp-up time in half. 12 Top 10 Reasons to Automate Your IT Run Books
Automate Repetitive Maintenance Procedures In addition to automating key ITIL processes as discusses in previous sections, Run Book Automation is equally as powerful when applied to common, repetitive maintenance procedures. Periodic maintenance procedures are repetitive and time-consuming when performed manually. The tasks are also typically scheduled and predictable, with standard well-understood processes and predictable outcomes. Because of all these characteristics, they are well suited to automation. A few examples of these types of tasks include Stopping, starting, and restarting services at timed intervals Rebooting and reconfiguring file and print servers Changing passwords and creating users Log file rotation, scrubbing, and monitoring Periodic database defragmentation Because of this predictability, these processes and procedures can be documented and automated using a Run Book Automation tool, allowing for them to be executed as needed by a run book scheduler, by an operator, or as triggered by a specific event. With a Run Book Automation tool, these tasks can be executed in a visually guided way or completely automated. And the output or the completed tasks is captured for future audits and reporting. In all cases, Run Book Automation of maintenance and repetitive tasks enables IT Professionals to focus on more critical business issues. Top 10 Reasons to Automate your Run Books 13
Integrate Disparate Systems Management Tools and Processes Deployment success for systems management tools has been mixed. Many are partially deployed and most are not well integrated into other systems management tools. Most organizations run a very heterogeneous mix of tools from 4, 5, or even more major vendors. Run Book Automation solutions must be extremely flexible in supporting the broad array of processes that are performed in today s Data and Network Operations Centers. However, just as important as broad process support is support for the heterogeneous tools environment found in today s enterprise IT organizations. Leading Run Book Automation solutions provides out-of-the-box integration to common monitoring (Mercury SiteScope, NetIQ AppManager, BMC Patrol, etc.), event console (Tivoli Enterprise Console, Micromuse Netcool, HP OpenView, etc.), CMDB, and ticketing (HP Peregrine, BMC Remedy, etc.) tools. These out-of-the-box system management integrations enable large enterprises to automate run book processes, integrate disparate processes, and integrate the tools that support these processes. This tool and process integration, combined with automation, leads to an extremely effective, efficient, and proactive IT Operations organization. Summary Now is the time to start automating data center operations by leveraging the latest Run Book Automation solutions. Getting started is simple and easy. By focusing on a few key areas, any experienced IT shop can achieve rapid Return on Investment and experience a significant reduction in IT complexity and support cost. 14 Top 10 Reasons to Automate Your IT Run Books
For More Information If you would like additional information about the Opsware System or Opsware Inc., please visit our Web site at www.opsware.com, or call 408-744-7770. About Opsware Inc. Opsware, the world s leading IT automation company, unlocks the promise of technology by accelerating IT to zero latency. The company s software, the Opsware System, automates the entire data center, from provisioning to patching, configuration to compliance and discovery to deployment, turning data center operations into a competitive advantage for business. Opsware s technology is used by hundreds of companies worldwide including banks, service providers, retailers, manufacturers and Internet companies with IT environments ranging from hundreds to tens of thousands of servers, network devices, storage devices and IT processes. For more information on Opsware Inc., please visit our Web site at www.opsware.com. Opsware is a service mark and trademark of Opsware Inc. All other product names, service marks, and trademarks mentioned herein are trademarks of their respective owners.. Copyright 2007 Opsware Inc. NOT for Redistribution. All Rights Reserved.
Corporate Headquarters 599 North Mathilda Avenue Sunnyvale California 94085 USA T 408.744.7300 F 408.744.7383 www.opsware.com Copyright 2007 Opsware Inc. All Rights reserved. Rev. 5/07