We’ve all been in that situation where you’re streaming along, banging out some sick code for a new feature. You’re excited to see how many minds you’ll blow when you demo it. And then you see a blip on your radar. Something is going wrong with your production system. Oh no. You give it a cursory look and you realize you don’t understand what’s going on. At this point, you know your day is shot. Why? Because you already know you’ll need to spend too much time debugging.
What, Specifically, Is Debugging?
I want to be clear when I say we spend too much time debugging. I don’t just mean firing up our system locally, loading in some data, and stepping through some breakpoints. Debugging here means the list of steps taken to diagnose and solve a problem. This often does mean firing up a local debugger of some sort, but it could be as simple as investigating a couple of log messages and restarting a service. So when I say we spend too much time debugging, I mean we spend too much time looking at logs. And painfully scrounging through database entries, relentlessly querying ad hoc bits of data. And, yes, we also slow down to a crawl when we spin up the debugger on our local machine.
Why Is It So Painful?
You may be curious as to why debugging is so painful. You may, as a developer, have experienced the derailing pain of stopping your work to debug some vicious defect. But it may also be hard to articulate exactly what took so long or why it was so annoying, other than the change of plans.
But I think we can find much of the root for our pain in the fact that debugging is mostly dealing with the unknown. And we normally don’t thrive with large chunks of uncertainty. We like to quickly make the uncertain certain, to feel we have control over the situation. It’s rare to enjoy digging around the log files for hours, searching for what seems like a needle in a haystack.
The pain of this uncertainty while debugging is only increasing. Software systems are getting more complex year by year. We’re not keeping up with ways to abstract that complexity into simple things we can grasp. If we are to have a chance to overcome this complexity, we must find ways to reduce or mitigate the pain of debugging.
Is There Any Hope?
Fortunately, we do have tools in our toolbox to drive away the pain and uncertainty of debugging. We have the means to drastically reduce the time it takes to go from uncertainty to certainty. We can also delegate much of our efforts to the computer. And we can automate away some of the painful bits and let our software do the legwork for us.
Scan for Weaknesses
As I just stated, software can be better at catching many more of its own mistakes than we can. Common problems like vulnerabilities to null exceptions can be scanned for you by static analyzers like CodeIt.Right. This drastically cuts down on the frequency that you’ll have to hunt down obscure bugs. There is almost an inverse ratio that detects weaknesses that are hard for humans to find but are actually easy for computers to find. If we let our software hunt down these hard-to-see weaknesses, we can then focus on the more obvious things to debug in our system.
Make Your Logs Searchable
One of the more painful bits of debugging is searching for needles in a haystack of log messages. I’ve seen very few software systems well-equipped to give log information centered around debugging. This is ironic, considering we even have a common logging level called DEBUG. Yet, the art of designing insightful software is rare.
See You Later, Aggregator
How searchable are your logs? Do you just scrounge around in the array of application and server files? If so, I recommend procuring a searchable log storage system. The more searchable your logs are, the quicker you’ll debug from uncertainty to certainty. If you have a distributed system, procure a full-on log aggregator.
However, tooling these on their own won’t be enough. You also have to treat log messages and formatting as first-class output from your application. This means you need to bring a consistent way to format and parse your messages. This will drastically reduce the time it will take you to index and query your logs with your fancy log storage. I highly recommend structured logging to make this happen.
What to Log?
Finally, log everything. I mean everything. Environment. Application IDs. Request IDs. Workflow correlation IDs. Request inputs. Duration. You get the picture. You can’t predict what will break in your application, or else you would have already tested for it. Therefore, having information you may never need will help you better than information you need but don’t have. Let’s put it this way: if you find yourself continually needing to add more logging context when debugging most defects, then youa probably need to build in more context up front on every user story.
These actions can take your debugging steps from days to minutes. I’ve seen this happen firsthand.
Discover the Hidden Persona
You may find yourself developing APIs for your customers, thinking about how to make pieces of your system usable and discoverable. This is noble work and it’s good to have a customer-centric mind-set. A common tactic for tackling this is to start with a set of personas that represent your customer base. This lets you humanize them and can make it easier to meet their needs. But there is often a forgotten persona that exists in almost every software system: the support person.
Who Is the Support Person?
Whether it’s in a separate ops team or a role you all share on a cross-cutting team, we often forget about the support person until it’s too late. They never show up in user stories. They get sliced out of minimum viable products. And then the stories show up in production and the pain of forgetting this persona rears its head. You find it hard to learn what happened to a workflow. Or you string together a mess of database records to pinpoint an intermittent bug. You finally give up and spin up the debugger, hoping that you can somehow reproduce the effort.
It’s easy to let this happen, but we can stop it. We can bring the support person to the forefront. First, make them a first-class citizen of your personas. And this should be easy. After all, as the debugger, you are the support person. Once you understand that, build the needs of the support person into user stories. Or even create support-focused user stories as part of your EPICS. Add acceptance criteria here and there. For example, if you have a user story that kicks off a fulfillment workflow, perhaps build a simple status end point that lets you see the current state of the workflow. It doesn’t have to be publicly visible or filled with bells and whistles. Just something bare bones that lets you audit your system.
We have the chance to stop debugging from getting the best of us. As humans, we’re grossly inefficient debuggers, slowly slogging through swaths of code. And as systems become increasingly distributed, this slog overwhelms our minds. Unshackle your handcuffs. Let your logs tell you what’s going on. Let static analysis catch problems earlier, before they become debug disasters. And think about that hidden persona, the support person, when designing your APIs. Enjoy the freedom you’ll have to focus on more fun coding, like making your customers happy.