Anatomy of C# Spell Check
With the vast ecosystem of tools and plugins around Visual Studio, the existence of a C# spell check shouldn’t come as a surprise. ?But maybe it does. ?It’s entirely possible that this sort of spell checking never occurred to you as something worth doing. ?Or maybe it just never occurred to you at all.
Personally, I think spell checking your code is most definitely worth doing. ?I won’t belabor the point here, since I’ve made this case in the past. ?Suffice it to say that since you can have it so effortlessly, you might as well get your spelling right.
Today, I’d like to talk instead about the problem of a C# spell checker (or a general purpose spell checker to use inside an IDE). ?In order to make things easy on you, the spell checker has to do some pretty sophisticated stuff and wrestle with subtle problems.
Spell check in a word processor is relatively easy from an implementation perspective. ?You assume the person is typing natural language and check all words against a dictionary. ?But in code? ?Not so much. ?Let’s take a look at some of the reasons for that.
C# Spell Check: the Basic Challenge
Written English (or any natural language) has very specific rules around spelling and grammar. ?Spaces demarcate words, and periods do the same for sentences. ?If you spell words incorrectly or have sentence fragments, you have nonsense, as far as the language is concerned.
This is, of course, also true of source code. ?Spaces separate tokens, and semicolons (or parentheses) demarcate statements. ?But whereas precision around English syntax and semantics optimizes for human understanding, these properties in code optimize for compiler understanding. ?You have to spell and case reserved keywords correctly so that the compiler can identify them. ?But non-reserved language tokens? ?The compiler has no opinion or preference.
This creates a rule vacuum of sorts. ?As I type this sentence, English rules dictate that I capitalize the word “English.” ?If I wrote a method named GetEnglishEquivalent, the compiler would care not a whit whether I capitalized any or all of those letters. ?We programmers are left to figure this out by convention.
So the core challenge of C# spell check is to join these two worlds and navigate the subtleties of rules versus conventions.