Real World Proactive ITIL Continuous Improvement Practices Part 1 Mickey Nakamura
Part 1 Topics Mickey Nakamura Purpose of Today s Discussion Benefits of Proactive Continuous Improvement Lifecycle Approach based on ITIL and Six Sigma HOW? Problem Management Life-cycle Approach Intelligent Configuration (Asset) Management Part 2 Presented by Maziar Adl (Part 2 Topics) Open Discussion Q/A 2
Purpose of Today s Discussion The Purpose of Today s Discussion is to share Real World ITIL Proactive Continuous Improvement Practices to Transform your Organization 3
Benefits of Proactive Approach Self Sustaining Culture of Service Excellence Superior Customer Satisfaction Continual Service Improvement (CSI) Infused Across the Entire Organization Proactive Critical Business Impact Alignment and Communications to Customers Higher Predictability and Efficiencies 4
Benefits of Proactive Approach - Cases From 2007 to 2011: Designed and Implemented Proactive ITIL Continual Service Improvement practices and improved service level objectives (SLO) metrics by 45% and saving $540,000 monthly in program deliverables to the one of largest local government managed service programs in the nation. In 2011: Managed 369 critical incidents with improved low mean time to repair metrics of 1.52 hours which was 63% below SLA requirements From 2007 to 2011: Developed 900 Root Cause Analysis (RCA) cases resulting in the implementation of over 1,200 corrective actions resulting continuous improvement milestone savings of $6 million annually. In 2010: Team Achieved Gartner Industry Customer Satisfaction "best in class" score 3.98 out of 4.10.
# Tickets Problem Management Life-cycle Approach- Continual Service Improvement Results- Critical Issues 2007-2010 900 Priority 1 and Priority 2 Incidents - Yearly Results 20 800 700 794 18 16 600 14 500 400 500 407 421 12 10 8 Hours 300 6 200 100 2.22 1.76 1.45 1.31 4 2 0 0 Years QTY/Year Avg. Duration Trend Decrease of 373 Critical Incidents / 47% Improvement Trend Decrease of 0.91 MTTR / 41% Improvement
Problem Management Life-cycle Approach- Continual Service Improvement Results 2007-2010 160000 140000 120000 100000 80000 60000 40000 20000 0 Help Desk Contacts - Total Annual Contacts: 2007-2010 141841 132724 102604 98241 2007 2008 2009 2010 Total / Year Trend Decrease of 43,600 Tickets 31 % Improvement 7
Proactive Problem Management Life-cycle Approach
Problem Management Life-cycle Approach ITIL version Problem Management ITIL (Information Technology Infrastructure Library) view the objective of Problem Management is to minimize the adverse impact of problems and prevent the recurrence of incidents that impact service delivery. Problem management drives to: Reduce failures to an acceptable risk at an acceptable cost Reduce the number of problems Prevent recurrence Prevention through trend analysis Address primary root causes of problems Ensure Service Minimum Acceptable Service Levels (MASL) are achieved Reduce the impact of problems Reduce the duration of any associated outages Manage problems within agreed time frames Increase the productivity of human resources involved with the execution of the Problem Management Process. Monitor and measure the process in order to improve the process itself 9
Problem Management Life-cycle Approach Real World View Recurring Prioritized Issues or Incidents Corrective Actions or Service Improvement 10
Problem Management Life-cycle Approach Real World View Empower a Problem Management Team Role Responsibilities Problem Manager Has Executive Sponsorship and Horizontal Authority Across the Organization Owns the Problem Management Process Records problem information. Works with management to establish problem priorities and monitors problem resolution progress. Analyzes and reports problem data. Coordinates problem management activities. Determines and assigns action items to prevent future failures. Identifies, records, tracks, and correct issues impacting service delivery, recognize recurring problems, address procedural issues and contain or reduce the impact of problems that occur. Conducts reviews of problem resolutions Problem Analyst Analyzes all available data of related incidents and problems. Works with Problem Manager and Management Team to determine problem resolution actions and priorities. Documents problem resolution. Performs root cause analysis and problem resolution. Configuration Manager Works with technical team and problem manager to identify configuration items impacted by or causing problems and to resolve problems. Technology Area Subject Matter Expert Management Team Approves problem priority. Monitors progress. Investigate and diagnose problems. Define and identify work-arounds. Implement problem resolutions. 11
Root Cause Analysis Basics Definitions Root Cause - The cause that, if corrected, would prevent the recurrence of this and similar occurrences Contributing Cause - A cause that contributed to an occurrence, but by itself would not have caused the occurrence Root Cause Analysis - The process of identifying the root and contributing causes Root Cause Analysis Process - The Root Cause Analysis process beginning at the close of a Service Restoration Team 12
Problem Management Life-cycle Approach Real World Steps Timeline Procedure Example 13
RCA Cause & Affect Fishbone Diagram Examples Management Design Personnel/ Training Procedure Upgrade Not Tested Problem Call External Network Issue Software SQL Server at 100% Capacity Hardware Web Server Failed Unable to log on to Application Management Design Personnel/ Training Procedure Application not Load Tested Software Fishbone Poor Use of SQL Call in SW Upgrade SQL Server at 100% Capacity External Software Hardware 14
Problem Management RCA Corrective Actions Tracking Example 2010 Corrective Actions 300 276 250 234 200 150 100 50 0 Total Corrective Actions Developed 34 Total-Completed Total - Pending Total - Open, Requiring County Action 8 15
Problem Management Life-cycle Approach- Continual Service Improvement Results- Corrective Actions by Area RCA Corrective Action Items Aging By Framework for Week Ended, Jan 31, 2009 60 50 51 47 40 39 30 20 10 0 9 0 5 0 9 0 0 7 Implemented = 144 Past Due = 0 Applications 1 60 Network 52 DataCenter 48 Desktop 8 On Time = 24 Total Corrective Action Items = 168 16
Problem Management Life-cycle Approach- Continual Service Improvement Results Top 20 Category Types P1 & P2 Incidents by Product Type for December, 2010 Printer (Desktop) = 1/3% Anasazi = 1/3% CWW = 1/3% Server CoSDA0569P = 1/3% Qanet (Printing) = 1/3% Qanet (Other) = 2/5% TTC = 7/18% Phone Outage = 3/13% Framework Totals: Application: 22/55% TTC: 7/18% Datacenter: 3/8% Network: 3/8% Other: 3/8% Desktop: 2/5% ProScript = 1/3% CWW = 1/3% PHIS = 1/3% Millennium = 1/3% Qanet (Application) = 10/25% SDIR = 3/8% Documentum = 2/5% Kiva = 2/5% Kronos = 1/3% Kofax = 1/3% Qanet (Application) = 10/25% SDIR = 3/8% Documentum = 2/5% Kiva = 2/5% Kronos = 1/3% Kofax = 1/3% Millennium = 1/3% PHIS = 1/3% ProScript = 1/3% Phone Outage = 3/13% TTC = 7/18% Server CoSDA0569P = 1/3% CWW = 1/3% Anasazi = 1/3% Printer (Desktop) = 1/3% Qanet (Printing) = 1/3% Qanet (Other) = 2/5% CWW = 1/3% Total = 40 17
Intelligent Configuration (Asset) Management
Intelligent Configuration (Asset) Management Purpose Benefits and Value Top-Down Critical Services Strategy
Intelligent Configuration (Asset) Management Purpose Purpose is to proactively identify & assess Critical IT Service & Business impacts by building relationships linking IT Asset Configuration Items (CI s) within the Configuration Management Database (CMDB)
Intelligent Configuration (Asset) Management Benefits and Value Improved Incident Management effectiveness & resolution Proactive communication to IT and Customers during outages Enhanced ability to determine Root Cause and develop Corrective actions Improved ability to analyze the risk and business impacts of Changes/CRQ s Increased ITIL process maturity and integration across the enterprise
Top-Down Strategy- Critical Services & Business Impacts Top-Down Hierarchy Approach- Business Impacts/Relationships: Map & Build CI s per Critical Services and link to Business Impacts Customers 1. Critical Services linked to new CI s Business Impact - Customers 2. Critical Infrastructure linked to new CI s Business Impact - Customers
ENTERPRISE BLACKBERRY
Datacenter Rack to Circuit Breaker to Power Distribution Units (PDU) Model
Part 2 will be presented by Maziar Adl
Real World Proactive ITIL Continuous Improvement Practices Part 2 Maziar Adl
Today s Discussion Importance of having a quality mindset An approach to establishing a design organization Examples of implementing design practices in the realworld Results and takeaways Why preventive practices and a culture of quality is essential in the success of your business
A Note about Quality Total Quality Management (TQM): an integrative philosophy of management for continuously improving the quality of products and processes Wikipedia Without focus on quality, long term success in service or manufacturing is questionable: Japans success in manufacturing The trade deficit and Malcolm Baldrige act during the 80s in the US ITIL bases many of its principles and practices on TQM: - Fitness for Use Juran on definition of Quality - Plan, Do, Check, Act Deming on Continuous Improvement
Definitions of Quality Quality definitions vary between the guru s: - Fitness for use - Juran - Conformance to requirements Crosby - The loss a product causes to society after being shipped, other than losses caused by its intrinsic functions - Taguchi
Transforming Service Delivery Problem Organization Execution of projects with little planning or dedicated staff resulting in: Firefighting Unpredictable services and service levels Cost pressures
Design and Planning Build and Transition Operations The Organization Overall Architecture and Technologies / Standards* Capacity Requirements Availability Requirements Training Requirements POC requirements Design Diagrams and Requirements MOUs/SLRs Success Criteria TCO Calculations Alternatives Backup requirements Process / Procedure requirements Monitoring requirements Access Requirements Standard Requests Engineering and Ops review and acceptance Continuous improvement or scalability plans Approval of the Design from Peer Groups Approval of the design from Chief Architect Approval of the design from Customer Design Validation / Review and Acceptance Procurements Configuration and Installation Update and Audit of Asset Lists/CMDB RFC Execution and Coordination/Communication Rollout Plan Backout Plan Risk Assessment for RFC Ops Training and Handoff QA and Validation of Systems Feedback to Design Team Approval of Rollout from Ops Approval of Rollout from customer Approval of Rollout from Design group Support and monitoring during cool down/warranty period System acceptance Registering application in ops (contact information, service levels, monitoring and backup information, new or changed procedures, Access levels) Update procedures System lockdown QA and Validation of Monitoring, Backups and Access Assist in Negotiating SLRs/SLAs Freeze and Warranty Period Management
Design Group Mission Design Service Solutions Capacity Management Availability Management Catalog Management IT Service Continuity Management Security Management Service Level Management Supplier Management Strategy Operations Transition Reduce cost of downstream deployment and maintenance by conducting analysis of systems, designing solutions and informing and seeking approval from customers on new and upcoming solutions Prevent problems and incidents from happening in the first place Supporting fitness for use Optimizing costs Reducing rework Incident Management Problem Management Access Management Event Management Request Fulfillment
Service Design Package 7 opportunities seized 4 opportunities would have been too late and more costly to detect and fix Holistic approach to service Includes processes and people and not just technology Is reviewed by executives as well as operational staff and engineers A living document throughout the service lifecycle Subject to change control process
Capacity Report Focus on the vital few Published monthly One person accountable for preparation Data was collected from all other organization groups with engineers responsible in each area for providing data Aligned with the Service Catalog Includes a summary and detail analysis statement Reviewed and approved by: CTO Account Executive Sr. Management
Capacity Report - Summary Service Area Apr May Jun Jul Aug Sep Enterprise Network Enterprise Internet Enterprise Email (1) (2) Server Hosting Colocation (3) Enterprise Storage
Example 1 Storage capacity 2,000,000 Enterprise Mail Storage Capacity Storage Total: Capacity Limt: Benefits: Optimum re-order points Time for planning Transparency and executive briefing 1,500,000 1,000,000 500,000
Example 2 Virtual Cluster Failovers 140.00% 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% Memory Usage Trend - by Cluster Benefits: Reduce risk of failure in failover Increase up-time Optimize load Transparency Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8 Cluster 9 Cluster 10
Example 3 - Enterprise Bandwidth to the Internet Raw Data Information Benefits: Study patterns of business behavior Understand thresholds and performance requirements Increase awareness throughout the organization
Results Prevention of incidents and rework project High availability on virtual infrastructure Cost control due to forecasting and optimization in purchasing Increased transparency resulting in improved decision making across the organization Improved decision making Improved visibility into patterns of business behavior Increased staff awareness of quality and process control
Keys to Success Culture of quality, change in mindset Executive support Start small and grow Think big Keep the momentum Communicate, communicate, communicate
The Culture Focus on the vital few Understand the needs of the customer Review and improve the process not the product Consider service as a whole not just technology Reduce cost of quality Upstream studies in requirement and business patterns helps eliminate problems and incidents downstream Seek opportunities for optimizing cost and avoiding problems
Q&A
Biographies Mickey Nakamura is a Transformation Executive at Xerox Services responsible for leading service strategy, design, transition, and the transformation of new technology enterprises and organizations aligning managed services with strategic initiatives and objectives for private sector, federal, state, or local government. He has 22 years of award-winning leadership, transformation, and service experience in information technology, wireless technology, and telecommunications. He is an expert in managed services and business outsourcing with extensive experience developing and implementing ITIL best practices in enterprises that range from small start-ups to very large established, national Fortune 500 companies and top-secret federal government programs. Contact Info: 858-210-2075 mickey.nakamura@xerox.com Maziar Adl is a Senior Manager of Network and Platform Services with Xerox Americas Local Government. He has over 15+ years of experience in software development and IT which includes providing IT and software solutions to fortune 500 companies with BearingPoint and acting as CEO/CIO for NuCredo LLC a startup he founded in 2003. Maziar began his career at Xerox (formerly ACS) in 2008, where he managed the Application Services Group and later the Network and Platform Services Group on a major outsourcing contract. His efforts in transforming organizations in alignment with industry best practices and specifically ITIL has resulted in many improvements in the overall service. His main focus over the years has been on effective and efficient service delivery through preventive measures and operations excellence. Contact Info: 714-448-0026 maziar.adl@xerox.com 43