Perhaps you’ve heard of the concept of a source code analyzer before. I expect that many reading have, at least in passing. But I’d also be willing to bet that the term strikes you as somewhat vague and that everyone reading will picture slightly different things when they hear the term. So today I’ll do what I can to clear that vagueness up.
Static Code Analyzer: the Backstory
Let’s start with origin and motivation. As software developers, we automate things for a living. This gives us the unique ability not only to make others more efficient at doing their jobs but also to turn this same idea on our own work. We constantly seek to automate our own processes.
The code analyzer represents this exact idea. Boiled down to its essence, you can think of it quite simply. I’ll give a hypothetical example. Say you spend a lot of time looking through your code, making sure that all methods and properties have Pascal casing. As you waste another hour doing this, it occurs to you that you could probably write a program to do it automatically, taking your source files as input. And it then occurs to you that such a program would actually be better and more efficient than you at this task. Well, you’ve just conceived of the static code analyzer, if a very rudimentary one.
Code analyzers emerged as a result of this exact sentiment. In the late 1970s, engineers at Bell Labs created a utility called Lint. Its mission? To parse C code, looking for “likely bugs.” This concept had such a profound effect on the industry and such staying power that you’ve undoubtedly heard about a “linter” for your language of choice. These all descended, conceptually, from this original built nearly four decades ago.
The Proliferation of Code Analyzers
During the time between the release of the original Lint and now, the code analyzer has evolved considerably. Lint and similar programs were known as “first generation” static analyzers. These are characterized by usefulness, but also by suffering from problems of low signal-to-noise ratio.
Years later, as processing power improved considerably, the second generation of static code analyzer emerged. In the second generation, tools stopped treating the source code as simply text. Instead, these tokenized the code and stored it in semantically contextual representations. In other words, these tools grokked the source code. And they used this understanding to do much more sophisticated vulnerability analyses. But while this cut down considerably on the signal-to-noise ratio problems, it didn’t scale terribly well.
As processing power continued to grow by leaps and bounds, a third generation emerged. This generation used abstract syntax trees, better hardware, and more nuanced heuristic approaches to do deeper analysis than generation 1, and with better scale than generation 2. And it’s these third generation tools with which you are familiar.
These generations, however, represent a fairly shallow categorization in and of themselves. They address the advancing state of the art, but not the breadth of tools that we have at our disposal. Today’s static code analyzer has different flavors, and I’ll speak to some of those here. With an industry full of people looking to automate as much as possible, it should come as no surprise that a diverse ecosystem of code analysis options has emerged.
The Cosmetic Checker
When we categorize Lint as a generation 1 tool and say that we’re now on generation 3, that might seem to imply that linters are no longer around. That’s decidedly not the case. Despite the low ratio of signal to noise, these tools were useful and remain useful to this day.
One of the flavors of linter that you’ll see focuses on what I’ll call cosmetic code concerns. An iconic example of this in the .NET world is Stylecop. The rule about Pascal casing that I mentioned earlier would fit right into this style of code analyzer. It analyzes your source code, making sure that it conforms to stylistic standards.
Because it fits into the mold of the original linter (largely text focused), it runs quickly. And if you don’t care for some of its rules, it will also come with low signal-to-noise ratio.
The Code Metrics Gatherer
The next type of code analyzer gets a little more involved. I’m talking here about an analyzer that gathers and tracks metrics about your source code.
Some of the most prominent examples of code metrics include lines of code per method, lines of code per type, cyclomatic complexity, and coupling. Teams want these statistics about their codebase because they serve as leading indicators of codebase maintainability. In other words, codebases with large types, complex methods, and intense coupling tend to create maintenance headaches. So teams want to keep an eye on these statistics to see whether they’re headed for trouble.
The code metrics gatherer serves mainly to furnish data about your codebase.
The Automated Code Review
The metrics gatherer offers data but generally no guidance as to what to do with that data. The automated code review tool, on the other hand, highlights problems and suggests fixes.
This type of code analyzer is as straightforward as it sounds. It has a library of common coding mistakes, and it analyzes your codebase, looking for instances of them. Unlike a cosmetic checker, however, this doesn’t focus exclusively on formatting concerns. It will dive in and identify likely sources of bugs or improper usage.
In many cases, it will even offer to fix these problems for you, where standard fixes apply. So it’s an automated code reviewer and fixer, which is a handy analyzer to have in your tool chest.
The Defect Finder
Next up, consider a similar but subtly different code analyzer. The defect finder exists exclusively to track down bugs written into source code. The automated code review will cover a wide variety of issues, but this style of analyzer focuses exclusively on detecting improper runtime behavior via source code.
Reasoning about runtime behavior via examining source code (path analysis) is an extremely hard problem. This style of analyzer often has research roots and many are fairly expensive and runtime intensive. Still, this tends to represent the cutting edge in the automated detection of defects.
The Vulnerability Detector
The last style of code analyzer I’ll mention is the vulnerability detector. This is an analyzer that focuses specifically on security concerns.
Like the defect finder, this is a true instance of specialization. Security is of such high concern to so many organizations, particularly in the enterprise, that it demands its own set of tooling. Organizations with lots of money and lots to protect are always looking to gain and maintain an edge against hackers and other bad actors. So the demand for the vulnerability detector is intense.
The vulnerability detector works particularly well for seeking out common exploits (SQL injection, buffer overflows, etc.). But it faces a daunting task since many security vulnerabilities are unique and highly context-dependent.
The Importance of the Code Analyzer
I’ve talked through a number of different flavors of the code analyzer, but I should also mention that many, if not most, actually offer more than one of these styles of analysis. CodeIt.Right, for instance, serves primarily as an automated code review tool, but it provides several of these other varieties of analysis as well. My goal here was to help you understand the different styles of code analysis itself, rather than to try to categorize specific tools.
The field of automated code analysis is wide open, and tool authors are doing interesting things. I recommend that you take advantage of that and leverage as many of these tools as you can to help you with your codebase. As software developers, we spend our lives automating other people’s work, but we should make sure that we’re also automating our own as much as we can as well.
Tools at your disposal
SubMain offers CodeIt.Right that easily integrates into Visual Studio for flexible and intuitive automated code review solution that works real-time, on demand, at the source control check-in or as part of your build.