When you implement an algorithm that computes an optimal solution (i.e. minimum/maximum with respect to a cost function), how do you test that the solution is actually optimal?
This is the kind of question I face when implementing algorithms in QuickGraph. For example, I was recently looking at the minimum spanning tree of a graph (MST)? While checking that the result tree is a spanning tree is easy, checking that it is minimum is not obvious: nobody has written Assert.IsMinimum yet :). Here are a couple techniques that I found useful along the way:
Input-Output table
The most obvious approach is to pre-compute the result for a number of problems and assert the solution of the algorithm matches. In this MST case, use a small set of graphs for which the MST is well known, and check that the computed MST has the correct weight. This approach will take you only so far and requires a lot of manual work since you need to solve the problem (or find a known solution) for a number of representative cases.
Solution Perturbation
If the algorithm computes a solution that is minimal with respect to a cost function, one can try to perturbate the solution to see if there’s a smaller one. If so, it clearly violates the fact that the solution should be minimal, thus you just found a bug. In the case of MST, this would mean randomly picking edges that are not in the minimum spanning tree, swapping them and evaluate if the result tree is smaller.
This is kind of approach is actually used in optimization where the search space might have local minima (see simulated annealing).
Multiple Implementations
A nice thing about graph problems is that there are usually many (vastly) different ways to solve them. If multiple implementations are available, we can simply compare the result of each algorithm against each other and make sure that they match each other. Each algorithm might have bugs but it is unlikely that they share common ones. Since we have now a good oracle, we can apply this approach that a large number of inputs to increase our coverage.
In the MST case, two popular algorithms are Prim’s and Kruskal’s. Those are 2 different approaches: Prim is built on top of Dijkstra single source shortest path, while Kruskal is built on top of the disjoint-set data structure. By carefully picking the weights of the edges, we can assert different things:
- if the edge weights are all equals, any spanning tree is minimal. So we can compare the result to a depth-first-search algorithm (which can easily compute a spanning tree).
- if some edge weights are different, there may be many minimum spanning tree. In this case, we can still assert that weight of the tree is minimum.
- if all the edge weights are different, then the MST is unique. This fact can be used to precisely pinpoint the differences in solution during testing.
There is a corner case that needs to be checked: if all algorithm are no-op, i.e. they don’t do anything. There solution will always match!
Happy New Year!