How do you introduce unit testing into a large, legacy (C/C++) codebase?

mpontillo picture mpontillo · Apr 14, 2009 · Viewed 16.4k times · Source

We have a large, multi-platform application written in C. (with a small but growing amount of C++) It has evolved over the years with many features you would expect in a large C/C++ application:

  • #ifdef hell
  • Large files that make it hard to isolate testable code
  • Functions that are too complex to be easily testable

Since this code is targeted for embedded devices, it's a lot of overhead to run it on the actual target. So we would like to do more of our development and testing in quick cycles, on a local system. But we would like to avoid the classic strategy of "copy/paste into a .c file on your system, fix bugs, copy/paste back". If developers are going to to go the trouble to do that, we'd like to be able to recreate the same tests later, and run in an automated fashion.

Here's our problem: in order to refactor the code to be more modular, we need it to be more testable. But in order to introduce automated unit tests, we need it to be more modular.

One problem is that since our files are so large, we might have a function inside a file that calls a function in the same file that we need to stub out to make a good unit test. It seems like this would be less of a problem as our code gets more modular, but that is a long way off.

One thing we thought about doing was tagging "known to be testable" source code with comments. Then we could write a script scan source files for testable code, compile it in a separate file, and link it with the unit tests. We could slowly introduce the unit tests as we fix defects and add more functionality.

However, there is concern that maintaining this scheme (along with all the required stub functions) will become too much of a hassle, and developers will stop maintaining the unit tests. So another approach is to use a tool that automatically generates stubs for all the code, and link the file with that. (the only tool we have found that will do this is an expensive commercial product) But this approach seems to require that all our code be more modular before we can even begin, since only the external calls can be stubbed out.

Personally, I would rather have developers think about their external dependencies and intelligently write their own stubs. But this could be overwhelming to stub out all the dependencies for a horribly overgrown, 10,000 line file. It might be difficult to convince developers that they need to maintain stubs for all their external dependencies, but is that the right way to do it? (One other argument I've heard is that the maintainer of a subsystem should maintain the stubs for their subsystem. But I wonder if "forcing" developers to write their own stubs would lead to better unit testing?)

The #ifdefs, of course, add another whole dimension to the problem.

We have looked at several C/C++ based unit test frameworks, and there are a lot of options that look fine. But we have not found anything to ease the transition from "hairball of code with no unit tests" to "unit-testable code".

So here are my questions to anyone else who has been through this:

  • What is a good starting point? Are we going in the right direction, or are we missing something obvious?
  • What tools might be useful to help with the transition? (preferably free/open source, since our budget right now is roughly "zero")

Note, our build environment is Linux/UNIX based, so we can't use any Windows-only tools.

Answer

S.Lott picture S.Lott · Apr 14, 2009

we have not found anything to ease the transition from "hairball of code with no unit tests" to "unit-testable code".

How sad -- no miraculous solution -- just a lot of hard work correcting years of accumulated technical debt.

There is no easy transition. You have a large, complex, serious problem.

You can only solve it in tiny steps. Each tiny step involves the following.

  1. Pick a discrete piece of code that's absolutely essential. (Don't nibble around the edges at junk.) Pick a component that's important and -- somehow -- can be carved out of the rest. While a single function is ideal, it might be a tangled cluster of functions or maybe a whole file of functions. It's okay to start with something less than perfect for your testable components.

  2. Figure out what it's supposed to do. Figure out what it's interface is supposed to be. To do this, you may have to do some initial refactoring to make your target piece actually discrete.

  3. Write an "overall" integration test that -- for now -- tests your discrete piece of code more-or-less as it was found. Get this to pass before you try and change anything significant.

  4. Refactor the code into tidy, testable units that make better sense than your current hairball. You're going to have to maintain some backward compatibility (for now) with your overall integration test.

  5. Write unit tests for the new units.

  6. Once it all passes, decommission the old API and fix what will be broken by the change. If necessary, rework the original integration test; it tests the old API, you want to test the new API.

Iterate.