Hate to say it, but software has bugs. Back in the old days, when computer were full of tubes and took up entire buildings, a “bug” was literally when an insect or small rodent interfered with the internals of the computer (chewed a wire, knocked something loose, etc.).
Nowadays, bugs in software are related to different living creatures: instead of cockroaches and mice, it’s humans. Sometimes, it’s the human using the software incorrectly. Other times, it’s the human who configured it.
Sometimes, it’s the human who wrote the software. We developers at Qualtrax try to write the best code we can – but alas, we are human, and as such, we err. But, as we are human, we always improve. We learn from our mistakes, and we do better the next time around.
When a defect gets out and a customer reports it in, we assess how severe the bug is and prioritize it. The most severe bugs go into what we call the “Firelane”. This means that the developers stop working on whatever they are working on and immediately start fixing the problem. This is a hard choice to make, because it means that our schedule and plan of what we are going to complete during our 2-week sprint could get thrown off – but a bad bug must be squashed, so we get our fly swatters and RAID out and get ready to do the work.
This might blow some peoples’ minds: but at Qualtrax, we use Qualtrax to monitor the quality of Qualtrax. What that very confusing sentence means is that when we start work on a firelane, one of us logs into our own installation of Qualtrax. We start a Qualtrax workflow that we have built, “Qualtrax Defect Development Post-Mortem”. We record vital information such as what customers reported the problems, what versions of Qualtrax they are using, and what the defect’s symptoms are. With this information, we dive into the code and figure out what is going wrong. Once we discover the root cause in the code, we apply a code fix, add automated tests, and do manual testing to ensure as best we can that the defect will not reoccur. All of this takes anywhere from a few hours to a week, depending on the complexity and difficulty of the defect.
Once we have made the fix in code, we deliver what is called a “Hotfix”. This is a small release of Qualtrax that only addresses the firelane defect. We deliver this hotfix to the customer. We keep in contact with them to make sure that they do not see the defect again. This is very important, because sometimes what seems like one defect is actually several all partying together, and while we may have busted one because he had the incriminating evidence in his hand and sent him home to its mother, the others (who were sneaky and kept quiet while their friend was getting caught) are still getting away with all sorts of chaos – which is the exact opposite of what a Quality Manager wants.
We wait for two weeks. If in that time period, the defect has not reoccurred, and no other hidden defects have arisen, we officially mark the code as fixed. But wait! We’re not done! A good post-mortem doesn’t just fix the bug – it fixes the process that let the bug get out in the first place. The developers sit down and talk about how the defect got out. Did we forget to test some part of the product? Was there a configuration combination (browser, database, server OS, client tool version, 3rd party tool version, updated from a particular version to the version with the defect, and so on) that was overlooked in testing? We talk about all of these factors and more to figure out what happened (or didn’t happen) on our end that let this defect get to a customer.
Then, we come up with very specific action plans to prevent these types of defects from getting out in the future: if we missed testing a part of the product, we make sure that that part of the product is explicitly in our test plans the next time around; if we missed a configuration combination, we make sure that those combinations are hit in our next batch of release testing. We look at each action item from this meeting and come up with a timeline for implementation (we should do this within the week, this one should be next time we release, etc.). Once the action item’s timeline has been met, we make sure that it was implemented and assess how effective it was. This assessment occurs for each individual action item, on its own timeline, using workflow expiration (a wonderful Qualtrax feature!).
Some of you might recognize this as eerily similar to a CAPA-type process. Congratulations, you are correct! Since Qualtrax is ISO 9001, we adhere to the same quality standards, and us keeping track of our defects, how we address them, and what we do to correct the releasing of those defects is very important, both to our ISO accreditors and our customers.
And that’s what to expect when things don’t go as expected.