ITIL Capacity Management: Is it really Best Practice or is there room for improvement? Andy Bolton Capacitas Ltd.
Agenda Defining Best Practice ITIL Overview & ITIL Capacity Management ITIL Capacity Management sub-processes ITIL Capacity Management activities ITIL Capacity Management process interfaces How does it fit together? What is good, what is missing and what could be done better? Conclusions Bibliography 2
Best Practice : A definition To assess whether ITIL Capacity Management is really best practice we need a working definition. For this presentation we will adopt the following as a general definition of best practice, based on the relevant Wikipedia entry: Best Practice determines the most broadly effective and efficient means for organising a system or performing a function. Note: The downside to this, or any, definition of best practice assumes there is only one way to organise a system or perform a function that is broadly effective and efficient, in all circumstances. 3
Application Lifecycle & Capacity Management Concept Development Live Phase-out Feasibility Require Design Coding Testing Roll-out Live Changes End-of-life Need to produce an approximate cost of system to meet specified performance Review if requirements are achievable within budget Review design for performance problems, costs and scalability Provide design guidance to avoid performance antipatterns Review performance testing results for any problems Provide capacity assurance assessment that transient capacity requirements can be met Business-asusual capacity management ensuring systems can meet demands upon them based on current and future work Assess all software release changes to the application to ensure they will not affect system Plan for decommissioning requirements including any transient capacity required for migration to new platform Software Application Lifecycle 4
A Brief ITIL Overview ITIL is the only consistent and comprehensive documentation of best practice for IT Service Management. Used by many hundreds of organisations around the world, a whole ITIL philosophy has grown up around the guidance contained within the ITIL books and the supporting professional qualification scheme. ITIL consists of a series of books giving guidance on the provision of quality IT services, and on the accommodation and environmental facilities needed to support IT. ITIL has been developed in recognition of organisations' growing dependency on IT and embodies best practices for IT Service Management. Office of Government Commerce (OGC) Website This suggests that neither COBIT or MOF are consistent and comprehensive. 5
What comprises ITIL Capacity Management? ITIL 1: Original CCTA publication of ITIL Capacity Management in 1991 (Brian Johnson) consisted of a 92 page book. The original publication set is now known colloquially as ITIL 1. ITIL 2: Around 2000 the OGC (a successor of CCTA) released a new set of ITIL books (known as ITIL 2 ) with the ten service management processes divided into Service Support and Service Delivery. Capacity Management is one of the five subjects within the 300 page Service Delivery book, but itself only consists of 39 pages plus 5 pages containing appendixes. ITIL 3: Currently in progress involving the OGC, itsmf and other industry bodies. Please note that this presentation is based only on ITIL 2, the current OGC release, and will only focus on the Capacity Management process. 6
What is ITIL Capacity Management? Broken down into three separate tiered sub-processes: Business Capacity Management (IT Capacity Management assisting business decision-making) Service Capacity Management (focus on the end-to-end capacity requirements of each service) Resource Capacity Management (focus on individual system s capacity requirements) This provides a sensible method for partitioning different activities by their primary goals, customers and deliverables. Contains the following discrete components, most of which are termed as activities: Iterative activities Storage of Capacity Management data Demand Management Modelling Application Sizing Production of the Capacity Plan 7
ITIL Capacity Management Overview Business Capacity Management (BCM) Service Capacity Management (BCM) Resource Capacity Management (BCM) Iterative Activities Demand Management Modelling Application Sizing Storage of Capacity Management Data Production of the Capacity Plan CDB Figure Crown Copyright 2001 Covering all aspects of BCM, SCM and RCM 8
Business Capacity Management A prime objective of the Business Capacity Management sub-process is to ensure that the future business requirements for IT Services are considered and understood, and that sufficient Capacity to support the services is planned and implemented in an appropriate timescale This is the most confused and least well-described of the tiers of capacity management; sadly a great opportunity missed! ITIL appears to be at a loss, or at least confused, as to what Business Capacity Management (BCM) is. It appears to replicate details about Service Level Requirements (SLRs) and Service Level Agreements (SLAs) that are already within Service Capacity Management (SCM). I believe these service-focussed activities should remain wholly within SCM. Business Capacity Management should be about planning at the business level, driven by business volumetrics, rather than at any service or resource level. This will be covered later on in this presentation. 9
Business Capacity Management Agree Budget Figure Crown Copyright 2001 Identify and agree SLRs Design procure amend configuration Negotiation and verify SLA Sign SLA Update CMDB / CBD Resolve Capacity related Incidents & Problems New requirements Implement under Change Management Operational system complies with SLA 10
Service Capacity Management A prime objective of the Service Capacity Management sub-process is to identify and understand the IT Services, their use of resource, working patterns, peaks and troughs, and to ensure that the services can and do meet their SLA targets, i.e. to ensure that the IT Services perform as required. In this sub-process, the focus is on managing service performance, as determined by the targets contained in the SLAs or SLRs Service Capacity Management (SCM) is focussed on the IT services provided and used, irrespective of what underlying platforms they use, and so is interested in only performance and capacity aspects of each service. However ITIL suggests that SCM only comes into play once the service becomes operational ; this is because of its confusion over what Business Capacity Management should really be about, placing the pre-live aspects of Service Capacity Management in that sub-process instead; I believe this is wrong. Service Capacity Management should cover all aspects of the IT Service throughout its lifecycle, including pre-live. 11
Resource Capacity Management A prime objective of Resource Capacity Management is to identify and understand the Capacity and utilisation of each of the component parts in the IT Infrastructure. This ensures the optimum use of the current hardware and software resources In order to achieve and maintain the agreed service levels. All hardware components and many software components have a finite capacity, which, when exceeded, has the potential to cause performance problems. Resource Capacity Management is focussed on reviewing individual components of the IT infrastructure, usually at a platform-level, such as Solaris, Windows, Z/OS, etc. It concerns resources such as processors, memory, disk and network and so recognises the need to collect resource utilisation information on a regular ( iterative ) basis. It recommends that monitors should be installed on the individual hardware and software components configured to collect the necessary data. As this is has traditionally been the most common form of Capacity Management it is surprising to find it covered in four paragraphs. On the positive side RCM does then go on to also cover: the necessity for capacity managers to understand and recommend the benefits of new technology the necessity for capacity managers to cover resilience as part of their responsibility 12
Iterative Activities ITIL groups many of the business-as-usual activities together as they need to be carried out iteratively and form a natural cycle ; it calls these the iterative activities as shown in the diagram on the following page. The Monitoring activity focuses on monitoring the utilisation of resources and services; typical data includes CPU utilisation, transactions per second, transaction response time and queue lengths. The Analysis activity should identify trends from which the normal utilisation and service level, or baseline, can be established. The Tuning activity is where areas of the configuration identified in the Analysis activity could be tuned to better utilise the system resource or improve the performance of a particular service. The Implementation activity is the introduction to the live operation services any Changes that have been identified by the monitoring, analysis and tuning activities. 13
ITIL Capacity Management: iterative activities Tuning Implementation Analysis Monitoring Resource utilisation thresholds Figure Crown Copyright 2001 SLM thresholds Capacity Management Database (CDB) SLM exception reports Resource utilisation exception reports 14
Capacity Management Database (CDB) The Capacity Management Database (CDB) is the cornerstone of a successful Capacity Management process. Data in the CDB is stored and used by all the sub-processes of Capacity Management because it is a repository that that [sic] holds a number of different types of data viz. business, service, technical, financial and utilisation data. However the CDB is unlikely to be a single database and probably exists in several physical locations. The CDB is the central repository for all capacity management reporting and as such should contain (for all platforms, services and businesses): Business data Service data Technical data Financial data Utilisation data 15
Capacity Management Database (CDB) ITIL Capacity Management specifies the following as inputs of the CDB: For Business Data this includes: Number of accounts and products supported Seasonal variations of anticipated workloads For Service Data this includes: Response times SLM thresholds For Technical Data this includes: Resource utilisation limitations, e.g. 40% utilisation for a shared Ethernet segment For Financial Data this includes: Financial plans IT budgets For Utilisation Data this includes: CPU utilisation for servers Number of transactions and response times for applications 16
Capacity Management Database (CDB) ITIL Capacity Management specifies the following as outputs of the CDB: Service and Component Based Reports: reports must be produced to illustrate how the service and its constituent components are performing and how much of its maximum Capacity is being used. Exception Reporting: Reports that show when the Capacity and performance of a particular component or service becomes unacceptable are also a required output Capacity Forecasts: the Capacity Management process must predict future growth. To do this, future component and service Capacity must be forecast. A simple example of a Capacity forecast is a correlation between a business driver and a component utilisation, e.g. CPU utilisation against the number of accounts supported by the company. 17
Demand Management The prime objective of Demand Management is to influence the demand for computing resource and the use of that resource. This is initially a really strong inclusion in ITIL Capacity Management, as too many capacity professionals only concentrate on controlling supply, forgetting that demand is the other side of the equation. ITIL does recognise the difficulty in operating Demand Management as it could cause damage to the business Customers or to the reputation of the IT organisation, but does not seem to acknowledge the necessity for workload characterisation to undertake it accurately. It covers this important topic in only seven paragraphs, covering less than one page! 18
Modelling A prime objective of Capacity Management is to predict the behaviour of IT Services under a given volume and variety of work. Modelling is an activity that can be used to beneficial effect in any of the sub-processes of Capacity Management. Modelling, according to ITIL Capacity Management, only offers the following options: Trend Analysis Analytical Modelling Simulation Modelling Baseline Models However, ITIL barely distinguishes where each of these techniques should be used; it appears to simply offer them as a toolkit of available modelling methods. Although recognised as an underlying support activity to the overall process it is documented in only ten paragraphs. ITIL explains modelling quite poorly, appearing to think a baseline model is a type of model in its own right, while including Trend Analysis, which is really a forecasting technique. 19
Application Sizing The primary objective of Application Sizing is to estimate the resource requirements to support a proposed application Change or new application, to ensure it meets its required service levels. To achieve this application sizing has to be an integral part of the application lifecycle. Importantly ITIL recognises that it is much easier and less expensive to achieve the required service levels if the application design considers the required service levels at the very beginning of the application lifecycle, rather than at some later stage ; however it does not explicitly state the role Capacity Management has in performance assurance or vice versa. This is probably the most important recommendation in ITIL Capacity Management, so could do with being more strongly emphasised rather than being a mere seven short paragraphs. Unfortunately this recognition of the importance of Capacity Management within the development lifecycle is not a mandatory requirement; also it doesn t translate well into BS15000, the closely related British Standard, which only states the capacity management process should provide support to the development of new and changed services. 20
Capacity Plan The prime objective is to produce a plan that documents the current levels of resource utilisation and service performance, and after consideration of the business strategy and plans, forecasts the future requirements for resources to support the IT Services that underpin the business activities. The plan should clearly indicate clearly any assumptions made. It should also include any recommendations quantified in terms of resource required, cost, benefits, impact etc. ITIL refers to this as Production of the Capacity Plan. The Capacity Plan is the fundamental output that any capacity management function must deliver, yet ITIL accords it only four sentences in addition to the above objective paragraph (plus a template Capacity Plan in an annex). ITIL recommends capacity plans be published annually, in line with the business or budget lifecycles, and updated quarterly thereafter. This recommendation does not recognise that a Capacity Plan should really be produced in line with the rate of change on the platform or service under scrutiny. For example a government department may be only require an annual capacity plan but an Internet-based merchant could benefit from monthly capacity plans. ITIL does however mention that this may be required in extreme cases. 21
Activity Frequency ITIL describes when various activities should be undertaken as: On-going: Iterative activities Demand Management Storage of Capacity Management Data Ad-hoc: Modelling Application Sizing Regularly: Production of the Capacity Plan It also states that any one of the sub-processes of Capacity Management may carry out any of the activities, with the data that is generated being stored in the CDB. 22
Process Interfaces Service Support Service Delivery Incident Management Problem Management Change Management Configuration Management Release Management Information and resolutions on Capacity-related Incidents Provide assistance and resolutions on Capacity-related Problems Assess Changes for Capacity impact Provision of Configuration Item information Assistance with developing the distribution strategy Capacity Management Ensuring that performance and Capacity targets can be achieved in SLAs Close alignment as capacity issues result in service unavailability Determination of Capacity requirements for all recovery options Provision of cost summaries and Charging mechanisms Calculation of required capacity via Application Sizing activity Service Level Management Availability Management IT Service Continuity Management Financial Management Application Management 23
How does it all fit together? Concept Development Live Phase-out Feasibility Require Design Coding Testing Roll-out Live Changes End-of-life Need to produce an approximate cost of system to meet specified performance Review if requirements are achievable within budget Review design for performance problems, costs and scalability Provide design guidance to avoid performance antipatterns Application Sizing Review performance testing results for any problems Modelling Provide capacity assurance assessment that transient capacity requirements can be met Business-asusual capacity management ensuring systems can meet demands upon Iterative them based on Activities current and future work Performance Monitoring Demand Management Assess all software release changes to the application to ensure they will Sizing not affect system Capacity Plan Application Plan for decommissioning requirements including any transient capacity required for migration to new platform Businessfocussed activities Supporting activity Tools Capacity Database 24
What is good in ITIL Capacity Management? Coverage of Response Time Monitoring ITIL Framework for Service Management ITIL Capacity Management although basic it has a good breadth Recognition of potentially high cost of Capacity Management, especially tools Recognition of potentially valuable benefits of Capacity Management, including: Increased effectiveness and cost savings Reduced risk More confidence in forecasts Value to application lifecycle Interfaces to other Service Management processes Capacity Plan template Close relationship between Capacity Management and Availability Management Capacity Management Database overview Activity frequency timetable Planning, implementation and review of the Capacity Management process 25
What is good in ITIL Capacity Management? Recognises shortcoming of pay for upgrades as required approach to capacity management Recognises the complexity of distributed capacity management compared to the good old days of the mainframe Recognises the dependence of other service management processes on an effective capacity management process Good Capacity Management ensures NO SURPRISES Recognition that capacity management is about meeting current and future business requirements cost-effectively Capacity Management process s goal is to ensure that cost justifiable IT Capacity always exists and that it is matched it the current and future identified needs of the business Scope of the Capacity Management process it should be the focal point for all IT performance and capacity issues Capacity Management has a close, two-way relationship with the business strategy and planning process 26
What is good in ITIL Capacity Management? Recognition that the Capacity Management process requires accurate information on the business and IT strategy and plans to function effectively Capacity Management needs to assess all changes for their impact on capacity of the infrastructure Recognition that Capacity Management process activities are categorised into proactive and reactive activities The more successful the proactive activities of Capacity Management, the less need there will be for the reactive activities of Capacity Management Capacity Management should not be a last minute tick in the box just prior to Operations Acceptance and Customer Acceptance Recognition that SLAs should be verified by Capacity Management process using modelling Recognition that Capacity Management should identify new technology opportunities Capacity Management is a key enabler for business success Concept of a stratified approach encompassing Business, Service and Resources Capacity Management 27
What is missing from ITIL Capacity Management? 1. A recognition of the need for performance assurance / performance engineering within Capacity Management: Application Sizing and Modelling appear to be simply used to reactively size target platforms rather than assist with optimising the application design during the development lifecycle Without some level of performance assurance / performance engineering SLRs may not be met on any size platform Performance risk analysis at Change stage involving Capacity Management 2. Application co-existence modelling (within Modelling) 3. Workload characterisation, profiling, modelling & management 4. Demand Forecasting provided in business units (# accounts, etc.) 5. Organisational structures for large organisations 6. Explicit requirement for marketing forecasts to be passed to Capacity Management 28
What is missing from ITIL Capacity Management? Concept Development Live Phase-out Feasibility Require Design Coding Testing Roll-out Live Changes End-of-life Need to produce an approximate cost of system to meet specified performance Review if requirements are achievable within budget Review design for performance problems, costs and scalability Review performance testing results for any problems Provide design guidance to avoid performance antipatterns Provide capacity assurance assessment that transient capacity requirements can be met Business-asusual capacity management ensuring systems can meet demands upon Iterative them based on Activities current and future work Demand Management Assess all software release changes to the application to ensure they will Sizing not affect system Workload Characterisation Modelling Performance Assurance Capacity Plan Application Sizing Performance Demand Forecasting Monitoring Application Plan for decommissioning requirements including any transient capacity required for migration to new platform ITIL activities Missing activities Capacity Database 29
What could be done better? Firstly, Business Capacity Management: If you consider this is to be aligned and interfacing with the Business Management tier then this is a major lost opportunity. The business converses in metrics which are not easily useful or even identifiable to the IT user. An example in a financial services company would be: Business Manager: Talks in number of customer accounts, funds under management, etc. Service Manager: Talks in number of IT services, SLAs, response times, etc. Resource Manager: Talks in number of servers, OS, hardware specification & configuration, etc. These are not the same language and the translation between them is often non-trivial. A Business Manager will be using the units or metrics that he understands, cares about and relates to his bonus! So, for example, he could be interested in the number of customer accounts that the company has and expects to obtain in the future. Customer accounts may be useful for some simple capacity metrics but generally does not map 1:1 for any resource or service. More detail along these lines would be extremely useful. 30
What could be done better? Other areas that could do with improvement in ITIL Capacity Management: Application Sizing and Modelling activities sections should be re-written to explain these complex subjects more effectively While there is a recognition of the Time-to-Market pressures on Capacity Management process, ITIL provides no advice or recommendations to help The relationships between each of the activities needs more detailed explanation Increased complexity of Distributed Capacity Management mentioned briefly but not elaborated on Covers many important details of the activities and sub-processes only in the Implementation section Need to recognise that Capacity Management recommendations will often be ignored until too late (a lack of pragmatism) Recognition of the IT organisation s reliance on Capacity Management to produce a consolidated budget forecast The dependence on Capacity Management to provide transient capacity in Service Continuity situation is not well explained 31
Conclusions Q. Is ITIL Capacity Management really best practice as under our working definition? A. No. It is, on balance, a very good starting point, but lacks a consistent and coherent philosophy that should be evident within a best practice document. It could arguably be called good practice though Q. Why do I think it isn t best practice? A. In summary, for the following reasons: It defines a broad set of activities that should be undertaken by everyone to achieve appropriate Capacity Management, not recognising differing circumstances across organisations. It poorly describes key activities such as Modelling, Application Sizing and Production of the Capacity Plan It poorly describes the key sub-process of Business Capacity Management It is missing key activities including Performance Assurance, Workload Characterisation and Demand Forecasting 32
Any Questions? Andy Bolton Capacitas Ltd. 33
Bibliography IT Infrastructure Library: Service Delivery, TSO Books, 2001 Application Management, TSO Books, 2002 [Please note that ITIL and IT Infrastructure Library are Registered Trade Marks of OGC.] British Standards: BS 15000-1:2002, IT service management, Part 1: Specification for service management BS 15000-2:2003, IT service management, Part 2: Code of practice for service management Other Service Management & IT Governance Frameworks: Microsoft Operations Framework, Microsoft Corporation, www.microsoft.com COBIT (Control Objectives for Information and related Technology), IT Governance Institute (ITGI), www.itgi.org 34