Benefits of Root Cause Analysis

Root cause analysis (RCA) is a methodology used to solve problems at their root, rather than just fixing the obvious.  RCA is often equated to a kaizen improvement process, and rightly so, as it often digs into possible organizational change, rather than localized optimizations.  The benefits of RCA are that it uncovers relationships between causes and symptoms of problems, works to solve issues at the root itself and provides tangible evidence of cause and effect and solutions.

If a seamstress makes a shirt, and one sleeve it longer than the other, the easy fix is to fire the seamstress and hire a new one.  However, the next seamstress makes the exact same shirt, with the exact same flaws.  This introduces cycles and waste and lost revenue and increased overhead because RCA was not performed.  Had an RCA kaizen event been triggered, we could have started with the “Why”s.  Why was the sleeve too long?  Because the seamstress sewed it that way.  Why did she sew it that way?  Because that’s what the plans said to do.  Why did they plans call for one sleeve longer than the other?  Because that’s what the measurements were.  Why were the measurements off?  Because the person who recorded the measurements measure from the top of the shoulder for one sleeve and the bottom of the shoulder on the other sleeve.  Why did he measure from two different points of reference?  Because he was not adequately trained.  Why was he not adequately trained?  Because we assumed he didn’t lie on his resume and his training was adequate.  The point is, we’ve identified the need to either train or hire a new person to take measurements, not a new seamstress.

If you count the “why”s above, you see we’ve dug 5 deep.  The 5 whys is a concept credited to Taiichi Ohno (the father of the Toyota production system) and is used to dig to the root of problems.  However, it has its flaws.  The flaws aren’t actually with the process itself, but the act of the process performed by an individual or team.  If at any point the wrong question is asked, it may send you off on a tangent that isn’t going to lead to the root cause, thus provoking you to take action on the wrong cause.  Keep in mind, that you don’t have to stop at the 5th why, but rather the concept is to dig as far as is necessary to identify the root cause of the observable effects.

Above, we’ve taken the process from “fingerpointing” to the seamstress, to a learning process to identify the root cause of the problem.  By getting to the root cause of the initially identified problem, we may have also solved many other issues, such as inseams on pants being of different lengths, again, because of the measurements, thus preventing pants with legs of varying lengths.

The quality metrics of a clothing factory do not allow for shipment of these unusable clothes, and if you have those types of quality metrics, you need to have improvement processes in place to quickly and thoroughly identify the roots of problems without spinning your tires firing and hiring seamstresses.

Software is much more difficult however because tangible artifacts are not always present.  Its often that we find bugs and hack together fixes, rather than performing RCA to understand the full chain of events and relationships so that the actual root cause is identified, fixed and possibly fixes other bugs that could happen in the future.  In that sense, RCA is both reactive and pro-active.

RCA is a process that introduces organizational improvements in many situations, lasting improvements and most importantly, a learning process to follow for thorough understandings of relationships, causes and effect and solutions.  By practicing RCA, you eliminate taking action on possible causes, and delay a response to the last responsible moment when the actual root cause of an effect is identified.

Posted in Agile, Alt.Net, Lean | 11 Comments

The purpose of value stream maps are not the value stream maps

Value stream mapping is an activity that stems from lean methodologies used to capture and report on processes from beginning to end.  Once the value map stream is complete, you have a visual representation of the process and its activities such as inventory pulls, kanban signals, “milk runs”, buffers, load leveling and value processes themselves.

Once a VSM is complete, is now becomes a communication artifact, and to a certain degree, obsolete.  Once you have the VSM, you now have the information you went in search of, and the VSM is nothing more than a representation of knowledge.

The identification of steps that create value and steps that create waste are the ultimate goals of production of the VSM.  The purpose of the map isn’t the map, but the learning processes.  Its the granular evaluation of specific processes within a process that produces the knowledge required to trigger kaizen events and make process improvements.

Once the knowledge is acquired and understood and a kaizen event triggered, future state maps can be produced to show where process improvements can be made – increasing (where applicable) and streamlining value adding processes, and removing wasteful activities.  The future state map shows a more desirable VSM which is used to produce an action plan that can be introduced into the process and improvements made.

I was in the process of putting together VSM templates for Excel, and a colleague of mine made a good point when I asked him for input.  Paper is portable.  Pencil can be erased.  The longer you sit and evaluate a process, the more you find yourself erasing and making changes.

That being noted, and taking into consideration what I said above, I’m not sure I see value in digital VSMs at all.  They are short lived (when acted upon) artifacts the don’t require the effort it takes to screw with a tool outside of standard pencil and paper.

Posted in Agile, Alt.Net, Lean | 1 Comment

Identifing Waste, the Lean Way

As mentioned in a previous blog post, waste elimination is usually the most obvious and least resistant way to improve value and flow in a product.  So I’m just going to jump right into some of the waste factors that are usually easy to identify, evaluate, modify and sustain their solutions in software product development.  Not going to cover all forms of waste, just the most common ones.

Bugs and Defects.

A bug, in simplist terms, is any behavior of the system that do not meet the expectations of the customer.  Defects have a bit broader scope of who’s definitions cross into other forms of identifiable waste.  A defect may not be a symptom of the product itself, but rather of the team, or communication patters, or materials transportation, or processing and production.  Missing a deadline is a defect, but it is a defect of a larger entity than the product itself, same as missing information (most commonly found in requirements gathering, prioritization, feedback loops).  Bugs cause rework.  Rework is waste.  Retest is waste.  Every single bug and defect, no matter how large or small, has an impact on the overal production cost of a product, due to having to revisit ideas and code that have been addressed in a previous cycle.

Movement of Artifacts and Materials in Excess (Transportation)

Unnecessary movements are wasteful activities, and by unnecessary, I mean movements that do not add value.  Routing is probably the largest culprit of excess transportation.  Routing of materials is usually something that has room to be streamlined and improved and should be looked at promptly.  Signature requirements, too many eyes involved, processes out of sync or sequence, lengthy lead times, report approvals, data replication techniques, poor configuration management, lengthy feedback loops; all of these are usually causing too many movements to be involved in order to achieve a goal and end up increase the cost of production.  One thing not so obvious?  Office layout.  Yes, the layout of the facility in which teams work in very often causes excess movement of resources.  Think about that one.

Wait

Waiting overlaps into excess movement a bit, but still has its own caveats.  Feedback loops.  This is where most software teams have room to improve in waiting.  In manufacturing, its usually materials and inventory shortages that cause excess waiting.  When you wait, you waste.  That’s obvious and simple.  Waiting pops up in so, so many ways in software development.  Document generation and reviews, bugs and defects, resource shortage (not enough staff), somebody on vacation when collective code ownership is not practiced, no continuous integration, no unit tests, wrong kanban levels, manual and lengthy deployment scenarios, poor workflow… I could go on and on.  Usually in software, like mentioned in the beginning of this paragraph, its feedback loops.  And more common, is feedback from your own product, not the customer themselves.  Feedback in the forms of unit tests, integration tests, build servers, metrics reports, code reviews, a customer on the team etc are critical to the elimination of waste generated by waiting.

Backlogs (Excess inventory)

In manufacturing, this would be identified as excess inventory.  Scrum is an obvious process to identify “inventory excess” in the form of a complete backlog.  In lean, any backlog item that cannot be identified as a customer need based on actual, current demand, is excess and should be gotten rid of.  If its important, it will resurface later.  Managing those excess backlog items is a wateful activity and may cause issues that lead to less than ideal planning and processing.  Removing excess backlog items that are not pulled based on current demand may surface issues that need to be addressed, such as hidden processes and schedulings that are causing waste.

Over-processing

Over-processing is most commonly found in the form of writing a piece of code that does not pertain directly to the needs of the customer.  Beahvior driven design is the way I use to elimate this wasteful activity.  By practicing behavior driven design, I know that each and every line of code can be traced back to a customer requirement.  Code that is written because you think you might need it later, system reconfigurations, unnecessary refactorings, writing code is a very elaborate way when something simple will suffice; these are all wasteful activites and are not adding value to the product.

Over-production

This ties a bit into over-processing, but production is a result of processing, so when you have over-production, you’ve already over-processed.  Building modules that are not required and completing items when they are not part of the current demand of the customer are forms of over-production.  Even though the customer has not requested a feature, if you have built it, now it become excess inventory in the product that has to be maintained (carrying costs).  Not only that, its much more likely to have to be reworked when and if the customer does make a request for that feature, if it doesn’t become obsolete alltogether.

Posted in Lean | 4 Comments

An entry into lean

I, like many others, have been head deep into lean methodologies such as kaizen, kanban, 5S, value streams and lean in general.  As I continue to learn and practice these things, I’m going to start publishing, much like Laribee is with his focus on Kanban, in order to gain feedback and ideas.  I’m going to cover things in a bit more general matter than just approaching one methodology, but hope to hit on them all.

Today I just want to give a primer into lean so those of you who haven’t done much reading into it have a foundation from which we will build.

When you hear lean, its difficult not to throw the word efficiency into every sentence under discussion.  Efficiency is a metric and is easily measured for most things.  If your car is rated to get 20 MPG and you are achieving only 15 MPG, then your car is 75% efficient as it relates to its gas mileage.

Efficiency has its counterpart, however, and this is waste.  Many people will tell you lean is about eliminating waste, but that is not entirely true.  Lean is about improving efficiency, and waste elimination is typical the least expensive, most effective way to improve efficiency, but its not the only thing.  Thus, don’t focus soley on waste elimination, but on the improvement of efficiency itself.  For developers, obvious waste is easy to spot.  Phone calls, emails, ESPN.com and things of the like are main culprits.  You have to stop and ask yourself and evaluate each activity: does this activity help me achieve my goal for the day as it pertains to adding value to my client/product/service etc.  Identify – correct – sustain.

Kanban, as Dave has been implementing, is one system that helps with waste elimination by having a feedback loop and a continuous work flow by pulling downstream from upstream and current status evaluations.  Other methodologies have different principles behind them, but to achieve the same goals, and I will be talking about those as they each are forms of lean used to Identify Inefficiencies – Make Corrections – Sustain Positive Process Improvements.

Lean literature is everywhere.  Take some of the keywords I’ve talked about here and search the web for them.  You’ll find lots of great information.

Posted in Lean | 2 Comments

The Fly in the Soup of the Iteration

Where do bugs fit into your iterations?  This is a discussion I’ve had on many occasions with many different people.  Laribee mentioned they work bugs as soon as they come in.  I believe Bellware told me the same thing.  Provost and Newkirk both told me they get bugs prioritized into the backlog along with everything else as they come in, and they get estimated and put into a specific iteration along with the stories.

I’ve tried it both ways.  On the same project even.  We have gone from working them into the iteration, to doing them when they come in, back to working them into the iteration.  We weren’t as successful as we would have liked either way.  Bugs are always hard to deal with, because you don’t exactly know what’s wrong and everything is just guesswork until you go in and look.  And by the time you go in and look to see what could be the problem to estimate how long its going to take to fix it, you find out the majority of your bugs take a change to a single line of code (of course, write your unit test first to simulate the bug and fix the code).  90% of your time is spent figuring out what is causing it.  This is where the estimation of bugs has to hit, on the figuring out, not necessarily the fixing.

So how to handle this?  Well, you have to mix together a few different strategies in reality.  Some bugs will come across in the bug tracker as high priority, things that are hurting the production system and must be resolved within the next few days.  Many others will not be as high priority.  What my team does now, and it has worked very well, is when a critical bug comes in, the next person to finish a task will pick up the bug and work it.  All other bugs go into the backlog.  At any given time we have between 10–15 bugs in our backlog.  These are bugs that get worked on Friday mornings and afternoons as the iteration winds down and stories become completed and resources become available.  This has had very little impact (no noticable impact, actually) on our velocity and backlog curve.  So in the end we are not really estimating bugs.  For metrics purposes, we only keep track of a bug count, not an estimated velocity of what it would take to resolve a stream of bugs.

Short iterations, another topic I’ve been meaning to talk about, certainly helps deal with quick turnaround on high priority bugs.  If you are working 4 week iterations, the turnaround on a bug is horrendous unless you branch, resolve, merge… puke.  Either way, its still several days to get the bug out at worst case, which is no different than just working 1 week iterations.

Posted in Agile, Alt.Net, Lean, Scrum, TDD | 5 Comments