ScrumSense Cape Town February Kanban Systems for Software Engineering David J. Anderson Independent Management Consultant dja@djandersonassociates.com
http://www.limitedwipsociety.org Yahoo! Groups: kanbandev http://www.kanban101.com
April 21-23, atlanta.leanssc.org
Taiichi Ohno TPS = JIT + Deming Kanban is the tool that enables these
Kanban is a hot topic in the Agile community Open Space at Agile 2007 attracted 25 people Yahoo! Group now has 1050+ members and growing Implementations at countless companies since August 2007 and on 5 continents BBC is best known adopter with more than 20 teams in London using kanban Other adopters include Software Engineering Professionals (Indianapolis), BNP Parisbas (London), IPC Media (London), and Robert Bosch Corey Ladas published Scrumban, a book about adding Kanban to Scrum Henrik Kniberg & Matthias Skarin, Kanban and Scrum published in Winter/Spring Davd J. Anderson, Kanban: catalyzing Lean in your technology organization, due in Spring
Features 240 220 200 180 160 140 120 100 80 60 40 20 0 Managing with Queues back in 2004 Device Management Ike II Cumulative Flow Avg Lead Time WIP Time Inventory Started Designed Coded Complete
Immediately focus on reducing waste through improved quality measured as defects per unit of valued work Reduce work in progress Donald Reinertsen s Advice Together these will always have an immediate impact Given the groundwork on tracking and cumulative flow, the elements are in place to implement a kanban system
With Don s advice I developed my Recipe for Success Focus on Quality Reduce Work-in-Progress, Release Often Balance Demand against Throughput Prioritize And for advanced credit Reduce variability and improve the process
What is a kanban (pull) system?
5 Core Properties Definition of Kanban Limit Work-in-Progress Visualize Workflow Measure & Optimize Flow Make Process Policies Explicit Manage Quantitatively
Definition of Kanban (part 2) 5 Emergent Properties Prioritize Work by Cost of Delay Optimize Value with Classes of Service Spread Risk with Capacity Allocation Encourage Process Innovation Use Models * to Recognize Improvement Opportunities
Let s take a break for 15 minutes
Case Study Microsoft 2004/2005 XIT one of Microsoft s 8 IT departments XIT Sustained Engineering Small team Change requests Supports over 80 applications (and growing) Engineering responsibilities moved from Redmond (Washington, USA) to Hyderabad (India) in 2004 Hyderabad vendor is CMMI Level 5 and uses TSP/PSP Initial quality is very high
Change Requests Product Managers 90 80 Change 70 Requests 60 50 Sustained Engineering Supply v Demand 40 30 20 10 0 Jan-04 PM Apr-04 Quarters Supply Backlog Demand Jul-04 Dev Mgr Supply Demand is outstripping supply and the queue is growing 155 Days Dark Days in July 2004 Test Mgr Manager resigns end June 2004 open position Q3 2004 Backlog is 80+ and growing about 20 per quarter User Acceptance Test Lead time is 155 days Customer satisfaction lowest in IT department
Product Managers PM Mgr Change Requests3 Developers, Backlog Estimation (ROM) was Top Priority 3 Testers ROM But 80 Applications Estimation was using 33%-40% of available capacity!!! What happens?... 155 Days ROM Open and Read Source Code Read Application Guide Whole process about 1 day per developer and tester Test Mgr SLA 48 hours to return User a rough Acceptance order Test of magnitude estimate (ROM) All change requests are ROM estimated ROMs are expedited as top priority due to SLA
Budget Funding $600K* $200K $100K Estimates were used to facilitate cost accounting $100K Product Managers Change Requests PM Backlog ROM Dev Mgr 155 Days ROM Test Mgr Use ROM x Fixed Rate per hour to calculate Cost of Change Request Evaluate ROI of request ( Value/Cost ) Assign costs against bucket of funding from User Acceptance Test customer group *Numbers are indicative only
Product Managers Estimates were also used to facilitate monthly rescheduling Change Requests PM But Throughput was approximately 7 per month Backlog ROM Dev Mgr Backlog > 80 Rescheduling at least 70 requests that cannot be worked every month 155 Days ROM Test Mgr Monthly rescheduling meeting Assign urgency to each request on backlog Reschedule entire backlog based on urgency and ROM estimate (every month!) User Acceptance Test
Change Requests Actual effort was miniscule compared to lead time of 155 days Product Managers Change Requests PM Backlog ROM Historical data gathered over 9 months showed that a typical change request took approx 5 business days to process through development Low end was 1 day High end 15 days Dev Mgr 155 Days 25 20 ROM 15 10 Test Mgr 5 0 1 to 2 3 to 5 6 to 10 11 to 15 Effort in Days User Acceptance Test > 15 Dev Test
Product Managers Change Requests PM Backlog ROM Dev Mgr 155 Days Are Estimates waste? ROM Only 52% of requests were actually ever completed Other 48% Too big (bigger than 15 days) Too expensive (low value versus cost) Overtaken by events, application decommissioned before request is processed Test Mgr User Acceptance Test ROMs are taking 40% of capacity but 48% of ROMs represent analysis that is never used beyond estimate, schedule and go/no go decision! Knowledge work is perishable. ROM analysis is done months before work is conducted and there is no guarantee that ROM is conducted by same engineer who will code or test. Conclusion all ROMs are waste
Product Managers Change Requests PM Backlog Could it get worse? Expediting PTC Non-code Fix ROM Dev Mgr ROM Test Mgr User Acceptance Test Production Text Change E.g. 155 Days graphical changes, data changes, anything that didn t require a developer Must be expedited Need to make formal QA pass
State Model WIP limit initially 8 = 3 (dev) + 5 (q) WIP imit initially 8 = 3 (test) + 5 (q) XIT Change Request More Info Analysis New Info Received Transferred to Proj Team Pending Future Project By Design Duplicate Won t Fix No Repro Backlog Closed Approved On Hold Re-opened Dev Queue Dev Abandoned Started Resolved Failed Failed, Scope Changed Released Resolved (needs exception) New Started Test Queue Test UAT Queue UAT SOX On Hold Passed Started Complete Production Queue Passed
Change Requests PM Product from Test not Backlog Dev Managers PTC Expedite Kanban 8 cards (3 WIP 5 Queue) Development Kanban Typically enough for WIP + 7 days Test Kanban Typically enough for WIP + 7 days Pace line at rate of consumption Intervention 1 Pace the Line from Development Kanban 8 cards 25 Days Local Mgr User Acceptance Test At times of high expediting levels, kanban insures that line is paced Reduces lead time by insuring single-tasking Focuses customer acutely on selection of highest priority (urgency) requests for insertion into empty buffer slots
Product Managers Change Requests PM Stop cost accounting Kanban 8 cards (3 WIP 5 Queue) Intervention 2 Stop Estimating Kanban 8 cards No such thing as a cost of change request Costs are fixed PTC Expedite ROM activity abandoned Freed up 40% capacity Instant boost to productivity numbers Local Mgr Edge cases Funding Backlog is spent with vendor on 12 month 25 contract Days and paid out on monthly burn rate All changes would be treated equally for cost purposes Too big (take risk, identify once in development) Too expensive (don t care) Following Deming s advice manage for the normal and treat exceptions as exceptional Based on average of 5 business days through development User Acceptance Test
Product Managers Change Requests PM Backlog Intervention 3 - Balancing the Line PTC Expedite Kanban 9 cards (4 WIP 5 Queue) Kanban 8 cards 25 Days Local Mgr Kanban system and eradication of estimation should introduce slack in test Historical data suggested ratio of dev:test is 2:1 but XIT SE has 1:1 ratio (July 2004) Visited India to observe team after first two interventions Confirmed belief that test was not fully utilized Ask vendor to reallocate resource from test to dev to 4:2 Note: Change to kanban size for Dev Instant improvement in productivity (36 to 45) 25% User Acceptance Test
Change Requests Intervention 4 - Invest for Yet More Throughput PM PTC Expedite Kanban 11 9 cards (5 (4 WIP 65 Buffer) Kanban 8 cards Product Backlog 25 Days Managers Increase staff to 5:3 dev:test ratio in July 2005 Local Mgr Having eliminated waste and balanced the production process Removed ROM estimate waste Removed slack from test And changed policies that were causing muda to occur Ceased cost accounting removed ROM estimate waste Re-wrote SLAs with customer Changed prioritization and delayed commitment Total Improvement of 155% in productivity, but more was needed Maintains balance in line gains a total 217% productivity improvement since July 2004 Note further adjustment to kanban limit User Acceptance Test
Zero Backlog Nov 22 nd 2005 After 12 months team hits zero backlog of change requests November 2004 backlog is 80 and trending upwards A combination of falling demand and 200% productivity improvement meant supply was greater than demand and the backlog was depleted Team wins XIT Team Excellence Award for H2 2005
Throughput $7500 60 50 40 30 20 10 17 0 Sep-04 Dec-04 and Cost Lowest cost Highest Productivity per person Mar-05 45 Jun-05 $2900 Shortest lead time Highest customer satisfaction 56 Zero Backlog achieved Some idle time Lowers metrics Sep-05 $3300 Dec-05 45
Mar-04 May-04 Jul-04 Sep-04 Nov-04 Jan-05 Mar-05 May-05 Jul-05 Sep-05 Nov-05 Change Requests 5 Dev Productivity per Resource Dev Test Productivity (per resource) 23 Quarters 12 15 15 9 5 0 25 20 10 Zero Backlog achieved Some idle time Lowers metrics
Calendar Days 180 160 140 120 100 80 60 40 20 0 Why Lead Time is the best metric XIT-SE TTR From 5 Months To 3 Weeks in 5 Quarters New Manager No More ROMs New Prioritization Process Flow Management Lowest cost Highest Productivity per person FY04 Q4 FY05 Q1 FY05 Q2 FY05 Q3 FY05 Q4 FY06 Q1 FY06 Q2 Quarter Changed dev : test ratio from 3:3 to 4:2 Shortest lead time Highest customer satisfaction Other Sev TTR Sev 1 Zero Backlog Increased dev : test from 4:2 to 5:3
Let s take lunch
Kanban board and daily standup meeting were introduced in early February 2007 to add a sense of urgency and team collaboration More personal responsibility and accountability Resulted in better visual control Enabled more selforganization Less management supervision Better productivity Spontaneous quality circles and frequent Kaizen events
WIP Limit regulates work at each stage in the process WIP limits create a kanban pull system & a white board provides visualization of flow Flow from Engineering Ready to Release Ready Pull
5 4 4 3 2 2 Input Queue Analysis In Prog Done Dev Development Ready In Prog Done Kanban board design Build Ready Test Courtesy Olav Maassen QNH Release Ready Stage Prod.
5 4 4 3 2 2 Input Queue Analysis In Prog Done Dev Development Ready In Prog Done Flow Kanban board simulation Build Ready Test Courtesy Olav Maassen QNH Release Ready Stage Prod.
or use one of the many electronic kanban tools now on the market Agile Zen LeanKit Kanban Silver Catalyst Target Process RadTrack Flow.io Kanbanery Jira + Greenhopper v4.0 FogBugz + Plugin VersionOne Rally(?) Mingle (??)
Sticky Buddy scheme was instituted to allow remote workers to keep kanban board up-to-date and synchronized with electronic tracking
Exercise 1 Value-stream Modeling Take 15 minutes Split the room into small groups (approx 5 people per group) Collaborate Pick someone in the group. Ask them to describe a workflow from their organization and map the value-stream Create a kanban board to map and track the value-stream Ask Questions you don t know everything you need to know to complete the exercise! For example, how to model concurrency of tasks?
Change Requests and Production Bugs Customer valued and prioritized by governing board Define Work Item Types
De-lineate non customer-valued work Engineering Defects direct indicator of quality impact on productivity, linked to yellow sticky, not counted against kanban limit
Reserve capacity for essential maintenance & upgrades IT Maintenance Work Technology department reserving capacity for its own maintenance difficult to prioritize with business count against kanban limits
Types can be derived based on Source Work Item Types E.g. Regulatory Requirement, Field Sales Request, Strategic Planning Feature Workflow Size Items that flow through different steps (or in a different sequence) should be different types, tracked on same board Order of magnitude T-shirt sizes
Use swim lanes to lineate type & allocate capacity across lanes Allocation Total = 20 Change Req 12 Maintenance 2 Production Defect 6 5 4 3 4 2 2 Input Queue Analysis In Prog Done Dev Development Ready In Prog Done Build Ready Test = 20 total Release Ready Adapted from design courtesy Olav Maassen QNH...
Exercise 2 Work Item Type Definition Take 15 minutes Same groups as before Collaborate Define a set of work item types that flow through the value-stream defined in exercise 1 Decide how to visually communicate work item type for each item in the system Ask Questions you don t know everything you need to know to complete the exercise! For example, how to model hierarchical work item types?
Change Requests and Production Bugs Customer valued and prioritized by governing board Classes of Service
Process allows for a single Silver Bullet expedite request Silver bullet is hand carried through the system Personal attention from project manager Automatically jumps queues Required specialist resources drop other work in preference to working the silver bullet Release dates may be adjusted to accommodate required delivery date Expediting the Silver Bullet
Fixed delivery dates are allowed but not encouraged There are strict rules for date dependent work items E.g. date constrained by legal commitment with customer or supplier, or regulatory requirement (typically in financial systems) Generally, delivery dates are not allowed
Temporary classes of work may be introduced tactically to maximize exploitation of the system Extra Bug Special class of production bug, worked by slack developer resources and specially selected not to impact solutions analysis. Tested by developers not testers. Allows maximum exploitation for improved throughput
Generally speaking class of service should be defined according to cost of delay function Cost of delay function for an online Easter holiday marketing promotion is difference in integral under two curves
Expedite Example classes of service Fixed Delivery Date Unit step cost of delay function (or near approximation) Standard Class Linear tangible cost of delay function Intangible Class Intangible cost of delay Examples brand identity change, usability fix
Policies for Expedite Class of Service Expedite requests will be shown with white colored cards. Only one expedite request will be permitted at any given time. In other words, a kanban limit of 1 is assigned to the Expedite class of service. Expedite requests will be pulled immediately by a qualified resource. Other work will be put on-hold to process the expedite request. The kanban limit at any point in the workflow may be exceeded in order to accommodate the expedite request. If necessary a special (off-cycle) release will be planned to put the expedite request in production as early as possible.
Policies for Fixed Date Class of Service Fixed delivery date items will use purple colored cards. The required delivery date will be displayed on the bottom right hand corner of the card. Fixed delivery dates will receive some analysis and an estimate of size and effort may be made to assess the flow time. If the item is large it may be broken up into smaller items. Each smaller item will be assessed independently to see whether it qualifies as a fixed delivery date item. Fixed delivery date items will be held in the backlog until close to the point where they must enter the system in order to be delivered ontime given the flow time estimate. Fixed delivery date items will be given priority of selection for the input queue at the appropriate time. Fixed delivery date items will be pulled in preference to other less risky items. In this example, the will be pulled in preference to everything except expedite requests. Unlike expedite requests, fixed delivery date items will not involve putting other work aside or exceeding the kanban limit. If a fixed delivery date items gets behind and release on the desired date becomes less likely it may be promoted in class of service to an expedite request.
Policies for Standard Class of Service Standard class items will use yellow colored cards Standard class items will be prioritized into the input queue based on an agreed mechanism such as democratic voting and will typically be selected based on their cost of delay or business value Standard class items will use First in, First out (FIFO) queuing approach to prioritize their pull through the system. Typically when given an option a team member will pull the oldest standard class item from an available set of standard class items ready for the next step in the process Standard class items will queue for release when they are complete and ready for release. They will be released in the next scheduled release No estimation will be performed to determine a level of effort or flow time. Standard class items may be analyzed for size. Large items may be broken down into smaller items. Each item may be queued and flowed separately
Policies for Intangible Class of Service Intangible class items will use green colored cards. Intangible class items will be prioritized into the input queue based on an agreed mechanism such as democratic voting and will typically be selected based on their intangible business value. Intangible class items will be pulled through the system in an ad hoc fashion. Team members may choose to pull an intangible class item regardless of its entry date so long as a higher class item is not available as a preference. Intangible class items will queue for release when they are complete and ready for release. They will be released in the next scheduled release. No estimation will be performed to determine a level of effort or flow time. Intangible class items may be analyzed for size. Large items may be broken down into smaller items. Each item may be queued and flowed separately. Typically, the preference would be to put aside an intangible class item in order to process an expedite request. It is therefore sensible and shows a good spread of risk when intangible class items are flowing through the system.
Use Color to de-lineate class of service and allocate capacity across classes to manage risk Allocation 1 = 5% 4 = 20% 10 = 50% 6 = 30% 5 4 3 4 2 2= 20 total Input Queue Analysis In Prog Done Dev Development Ready In Prog Done Build Ready Test Release Ready Adapted from design courtesy Olav Maassen QNH...
Exercise 3 Classes of service Take 15 minutes Same groups as before Discuss the different classes of service that might apply Draw some example cost of delay curves Show how cost of delay is driving your choice of classes of service Define policies for processing each classes of service Decide how to visually communicate classes of service note that class of service should be unambiguous from work item type
Kanban tickets hold a lot of information that enable decentralized control and local decision making when deciding priority of items to pull through the system Electronic ID number Date Accepted clock starts on SLA Assigned engineer Issue attached to change request indicates management attention required Hard delivery date for regulatory, legal, or strategic reasons Signifies item that has exceeded SLA indicates that item should be prioritized if possible
Exercise 4 Design Work Item Tracking Card Take 15 minutes Same groups as before Collaborate Design the layout of a kanban wall card What information will you and the team need in order to adequately self-organize and flow work through the system Ask Questions you don t know everything you need to know to complete the exercise! For example, how will pull prioritization be decided?
WIP limits should be set based on policies and empirically adjusted up/down over time WIP Limit defined via policies tuned to organizational maturity
Examples of WIP Limit Policies Each person should work on no more than 2 items at a time Hence, limit = number of people x 2 Prioritization cadence and coordination meeting will be once per week Hence, input queue must be large enough for one week of pull Mathematically, input queue should be 3 sigma upper control limit of mean throughput (rate of pull)
General Observations about Kanban Limits Lower limits are better Less work-in-progress (WIP) Lower cycle times More agile More value delivery Tight lower limits inflict pain on the organization Tight limits will cause work to stall (no flow) Requires good skill in issue management & resolution (impediment removal) Requires capability in root cause analysis and resolution Lower maturity organizations should have looser limits Higher maturity organizations tighter limits
Kanban limits can be grouped across work columns and queues (or buffers) Kanban Limit of 4 for analysis and analysis complete queue Kanban Limit similar policy for development and development complete
Exercise 5 Setting Kanban Limits Take 15 minutes Same groups as before Collaborate Discuss policies for queues Discuss policies for work columns Discuss organizational maturity Set limits according to agreed policies and maturity Ask Questions you don t know everything you need to know to complete the exercise! For example, would separating out different sizes of work item, allow us to reduce queue sizes?
Scaling Kanban
Major Project with two-tiered kanban board using swim lanes for feature sets Swim lanes control WIP limit Requirement WIP (green) controlled by number of lanes
Small generalist team assigned to each swim lane responsible for feature breakdown and flow Feature Team names of team members assigned to swim lane
Cross-functional resources are shown with small orange tickets attached to features Shared resources e.g. enterprise data architect
Handling Cross-team Dependency Tag Work Items with Shared Resource Dependency Clearly mark as impeded Provides visibility onto the problem Provide shared resource with separate Kanban board/system Allow several sources of dependency to compete for scarce resource Treat Integration Items as Fixed Date class of service Base fixed date on agreed integration point
End of Day 1 Retrospective Take 5 minutes From today s tutorial, what have you learned? What questions do you have for tomorrow? Where do you believe Kanban would be useful? In what situations might Kanban be inappropriate or sub-optimal to implement? Why?
Models (to identify improvement opportunities)
3 Sources of Improvement Manage Bottlenecks (improve flow, deliver more value) Improve Economic Costs (reduce waste/overheads) Reduce Variability (Chance & Assignable Cause)
Economic Costs (waste identification)
What is efficiency?
What s the most efficient way to paint a fence?
Setup Focusing on efficiency produces better cost accounting results for large batch sizes Q Queue Queue Task Queue Task Time Task Wait Wait Task W Wait Cleanup
Because the transaction costs (in this case, the setup and cleanup) can be amortized over a greater number of value items (in this case, painted sections of fence)
But there are also coordination costs in knowledge work problems Setup Q Queue Queue Task Communication Queue Task Time Task Wait Wait Task W Wait Cleanup
And in knowledge work problems, coordination costs grow non-linearly with batch size Setup Q Queue Queue Task Communication Queue Task Time Task Wait Wait Task W Wait Cleanup
The prevailing paradigm in Western management was that overheads (transaction and coordination costs) were inevitable and efficiency was achieved through economies of scale (i.e. large batch sizes)
Sources of Waste Transaction Costs Coordination Costs Failure Load
Waste is Waste Thinking of startup, coordination and delivery tasks as waste is perplexing for most knowledge workers. Call them costs instead
cost Transaction Costs Transaction Costs Coordination Costs Value-added Work Failure Load time
Bad Good
Sources of Economic Costs Transaction Costs Coordination Costs Failure Load
Take 15 minutes Exercise 6 Cadence What s the correct release cadence for your kanban system? What are the transaction costs of a release? (training, deployment, governance) and the lead times involved (e.g. training lead time)? What is the appropriate input / prioritization cadence? How quickly are things changing? (in the business) If you reduce waste how might this affect cadence? Does it make sense for release cadence and prioritization to be the same?
Exercise 7 Economic Cost Identification Take 15 minutes Same groups Identify transaction costs in your project work Identify coordination costs How much of the total effort on projects, do you estimate is transaction or coordination costs (rather than value-added) effort? Estimate Failure load internal (bugs) and external (escaped) sources Draw a sketch of economic cost model based on data
Focus on Quality (Reduce Failure Load)
Too much WIP increases defect rates in a non-linear fashion
Defects have an opportunity cost Blue tickets are defects defects increase WIP, reduce the team s ability to work on valuable new features and lengthen lead time
Quantity of blue tickets on the board is an immediate indicator of development quality that is impeding flow of customer valued work and reducing throughput Engineering Defects direct indicator of quality impact on productivity
Defects have an opportunity cost In turn defects increase WIP, lengthen lead times and may result in yet more defects
Features Track WIP! WIP is directly related to Lead Time and Quality 240 220 200 180 160 140 120 100 80 60 40 20 0 10-Feb Device Management Ike II Cumulative Flow 17-Feb 24-Feb Lead Time WIP 2-Mar 9-Mar Time 16-Mar 23-Mar Inventory Started Designed Coded Complete 30-Mar
9-Oct 23-Oct 6-Nov 20-Nov 4-Dec 18-Dec 1-Jan 15-Jan 29-Jan 12-Feb 26-Feb 11-Mar Features Too much WIP, 3 month lead time led to a 30x reduction in quality compared to 1 week lead time 175 150 125 100 75 50 25 0 Project B Cumulative Flow Lead Time Time Inventory Started Designed Coded Complete
Defects have an opportunity cost In turn defects increase WIP, lengthen lead times and may result in yet more defects Too much WIP increases complexity and adds additional communication overhead
Defects have an opportunity cost In turn defects increase WIP, lengthen lead times and may result in yet more defects Too much WIP increases complexity and adds additional communication overhead By far the easiest way to increase throughput and reduce lead time is to focus on quality
Focus on Quality Measure, Track and Manage WIP Employ policies to minimize defects Design and Code reviews Static and Dynamic Code Analysis Test-first development Automated unit tests Continuous integration Formal system testing by professional testers Develop and test in small batches Provide developers with defect feedback as early as possible and as often as possible Enforce as much quality discipline with your configuration management tool as possible E.g. no check-in without passing unit tests
Exercise 10 Policies for Quality Take 15 minutes What policies might you set to positively affect quality? How will these policies affect workflow? Will kanban limits have to be adjusted? Which policies might be overridden when required? Can quality policies be related to classes of service? Ask Questions you don t know everything you need to know! For example, how should defects be handled in the workflow?
Manage Bottlenecks
Known max productivity Working Code Manage bottlenecks to increase productivity Idea Analysis Design Code 1. Identify - Current CCR is System Test 30 per month 2. Exploit - Testers relieved of all non-essential tasks, extra PMs assigned to complete administrative tasks, analysts assigned to future test plans 3. Subordinate - Requirements release restricted to ~30 per month Quality ~90 ~80 ~50 ~50 Error Acceptance Test Error System Test Unit Test Error ~80 ~30 ~40 ~50 Quality Schedule 4. Elevate - Plan to recruit temporary testing staff immediately Schedule How do I trim 25% off the schedule for a project going through this system?
Iteration 1 Burndown
Iteration 1 Cumulative Flow
Burndown versus Cumulative Flow Comparison
CR Only Business encouraged to re-triage backlog CR, Bugs and PDUs WIP growth due to additional resource allocation (good) and some sloppy management of kanban limits (bad) Cumulative Flow
Looks like a bottleneck Non-instant Availability Same thinking process applies Management approach is similar but policies will be different depending on type of bottleneck CCR vs NIA
Exercise 6 Bottleneck Management Take 15 minutes Same groups as before Where do you think there are bottlenecks in your workflow? Is it a CCR or an NIA type bottleneck? What actions would you take to maximize the utilization of your bottleneck resource? What policies are required to subordinate everything else to your exploitation decision? What policies would you remove or change? What new policies must you introduce?
Understanding Variability
How do event planners do it? Scheduling Wimbledon isn t an exact science Games last different lengths of time and weather conditions can stop play altogether but the Men s Final always happens on the 2nd Sunday If only we could project manage like this! Wimbledon comes in on time every year Except 2004 when it rained on Sunday and the final was on Monday ;-)
Working Code Where is the constraint now? Idea Analysis Design Code ~90+-10 ~80+-20 ~50+-10 ~50+-5 ~50+-10 ~50+-5 Error Acceptance Test System Test Unit Test 1. Variability variation in Design, Code or Unit Test can move the constraint 2. Plan Insecure Plan based on System Test as CCR is insecure 3. Reduce Variability Change methods in Design, Code & System Test to reduce variability 4. CCR has Narrowest Variability System Test as CCR should have narrowest variability Error Error ~80+-10 ~40+-15 ~40+-5 ~50+-5
Quantity of pink issue tickets on board directly indicates flow impacting problems that need attention from management Issues are the exception attached to work items that are blocked for external reasons and call attention to problems preventing smooth flow
Issue Management Cumulative Flow
Exercise 7 Classifying Variability Take 15 minutes Same groups as before List the policies under your control that affect the time taken to complete value-added work e.g. Code Inspections, Customer Review What levers can you pull and how might they affect risk? e.g. removing code inspections List typical events out of your control that randomize your plans and schedules What actions might you take to mitigate the effects of these external events Can classes of service be used to help with risk mitigation?
Kaizen Culture
Map the value stream and track work on a white board Hold a standup meeting every day in front of the board
or create an application that looks like a whiteboard but provides the archival advantage of an electronic system
Kanban uses a daily standup meeting as a central enabler of a Kaizen culture In this example more than 40 people attend a standup for a large project with 6 concurrent development teams. The meeting is usually completed in approximately 10 minutes. Never more than 15.
Manage quantitatively and objectively using only a few simple metrics Quality WIP (work-in-progress) Lead time (against target and DDP) Issue & Blocked Work Throughput Cycle Time Efficiency Across these: Trend Variation
Hold a monthly Operations Review and present all the data, invite anyone who wants to come
Kanban sets an expectation of High Maturity Root cause analysis on defects stop the line CAR Kaizen events OID
Educate the workforce to recognize process problems that affect performance
Look how the board has changed by March! Empirically adjusted Kanban limits reacting to industrial engineering issues. Much neater presentation pride in the process is forming
And again in April, more changes to Kanban limits and forward extension of the process to business analysis
Waste bin spontaneously introduced by team to visually communicate rejected CRs that wasted energy and sucked productivity
A report was created to detail rejected or cancelled work items (muda)
Kanban has allowed us to observe known industrial engineering issues Expediting impacts throughput and lead time Tends to increase WIP Overly large CRs cause ragged flow, blow out lead time Larger variation in CR size has required larger queues and buffers extending lead time Non-constraints exhibit ragged flow behavior due to non-instant availability e.g. integration build Bottlenecks create ragged flow (e.g. Test) but also experience idle time from upstream ragged flow from non-instant availability stations (e.g. Build) Big items are now broken up, temporarily breaks the Kanban limit but pull system means no new items enter WIP until overflow is depleted. Result is smoother flow even with big items
Spontaneous Quality Circles started forming Kanban board gives visibility into process issues ragged flow, transaction costs of releases or transfers through stages in process, bottlenecks Daily standup provides forum for spontaneous association to attack process issues affecting productivity and lead time For example, 3 day freeze on test environment was a transaction cost on release that caused a bottleneck at build state. This was reduced to 24 hours after a 3 person quality circle formed to investigate the policies behind the freeze. Result was improved smooth flow resulting in higher throughput and shorter lead time
Other spontaneous quality circle kaizen events Empirically adjusted kanban limits several times E.g. test kanban too small, causing ragged flow UAT state added Prompted by test who were experiencing slack time Expanded kanban limit on Build Ready state, added Test Ready state Introduced to smooth flow post release due to environment outage transaction cost Introduced kanban board, daily standup, colored post-it notes for different classes of service, notations on the post-its Poor requirements causing downstream waste resulted in an upstream inspection to eliminate issues with poorly specified requests
In general, empirical observation of ragged flow or visibility of waste generates a quality circle resulting in a kaizen event
September 2007 Business Analysis and Systems Analysis merged eliminating 25% of lead time consumed as queuing waste
And the process spread inside Corbis
And externally at companies like Yahoo!
this one on the Mash social network team
And the technique is being introduced to major projects with much longer time horizons. This example has a monthly integration event rather than a release every two weeks
5 months later significant changes are evident
Major project with two-tiered kanban board
Major Project with two-tiered kanban board using swim lanes for feature sets
Less mature major project in trouble adopts kanban to bring a focus to daily routine and visibility to work-inprogress to team and management
Next Day Team is maturing quickly and has refactored the board with swim lanes for functional areas
More and more reports were demanded to facilitate management decisions. In this case, new reports to facilitate weekly prioritization
Completed items are collected on the wall to the right hand side of the board. Visual indicator of cumulative total of value delivered Management instituted a bottle tops saving scheme where completed items are traded against morale events to celebrate success E.g. movie tickets, bowling evening, trip to Gameworks
Executive Dashboard
Days SLA 60.0 50.0 40.0 30.0 20.0 10.0 0.0 Mean Lead Time Trend Mean Lead Time Trend Dec Jan Feb Mar Apr May CRs Bugs Combo
CRs & Bugs 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 # CRs 3.5 3 2.5 2 1.5 1 0.5 0 1 2.5 2 1.5 1 0.5 0 8 15 22 29 36 43 50 Lead Time Distribution Days Lead Time Distribution 57 64 Due Date Performance Detail Majority of CRs range 30 -> 55 Outliers 71 78 Days 85 92 99 106 113 120 127 134 141 148 MARCH APRIL
Take 15 minutes Exercise 12 Metrics What metrics would you track? Who is the audience for the metrics? Can you identify day-to-day (leading indicator) metrics used for management intervention? Versus weekly, monthly, quarterly (lagging indicator) metrics used for executive reporting, customer satisfaction and assessment of continuous improvement?
Culture Change Summary Trust, empowerment, objective data measurement, collaborative team working and focus on quality Policy Changes Late-binding release scope, no estimating, latebinding prioritization Cycle time, Release Cadence and Prioritization Cadence are decoupled Cross-functional collaboration Previously unheard of VP level selfless collaboration on business priority Self-regulating process robust to gaming and abuse Continuous Improvement Increased throughput, high quality, process continually evolving, kanban limits empirically adjusted
And finally, staff take a pride in their achievements
Day 2 Retrospective What will you change? Take 5 minutes What have you learned today? What one thing will you implement as a change in your organization when you get back to the office this week? How far do you think Lean/Kanban can take your organization? Take 5 minutes to discuss your results and share with the group
Thank you! dja@agilemanagement.net http://www.agilemanagement.net/
David Anderson is a thought leader in managing effective software teams. He leads a consulting firm dedicated to improving economic performance of knowledge worker businesses improving agility, reducing cycle times, improving productivity and efficiency in technology development. He has 25+ years experience in the software industry starting with computer games in the early 1980 s. He has led software teams delivering superior productivity and quality using innovative agile methods. He developed MSF for CMMI Process Improvement for Microsoft. He is a co-author of the SEI Technical Note, CMMI and Agile: Why not embrace both! David s book, Agile Management for Software Engineering Applying the Theory of Constraints for Business Results, introduced many ideas from Lean and Theory of Constraints into software engineering. David was a founder of the Lean Software & Systems Consortium, a not for profit dedicated to promoting better standards of professionalism and effectiveness in software engineering. Email dja@agilemanagement.net About