# Thursday, January 01, 2009

When you implement an algorithm that computes an optimal solution (i.e. minimum/maximum with respect to a cost function), how do you test that the solution is actually optimal?

This is the kind of question I face when implementing algorithms in QuickGraph. For example, I was recently looking at the minimum spanning tree of a graph (MST)? While checking that the result tree is a spanning tree is easy, checking that it is minimum is not obvious: nobody has written Assert.IsMinimum yet :). Here are a couple techniques that I found useful along the way:

Input-Output table

The most obvious approach is to pre-compute the result for a number of problems and assert the solution of the algorithm matches. In this MST case, use a small set of graphs for which the MST is well known, and check that the computed MST has the correct weight. This approach will take you only so far and requires a lot of manual work since you need to solve the problem (or find a known solution) for a number of representative cases.

Solution Perturbation

If the algorithm computes a solution that is minimal with respect to a cost function, one can try to perturbate the solution to see if there’s a smaller one. If so, it clearly violates the fact that the solution should be minimal, thus you just found a bug. In the case of MST, this would mean randomly picking edges that are not in the minimum spanning tree, swapping them and evaluate if the result tree is smaller.

This is kind of approach is actually used in optimization where the search space might have local minima (see simulated annealing).

Multiple Implementations

A nice thing about graph problems is that there are usually many (vastly) different ways to solve them. If multiple implementations are available, we can simply compare the result of each algorithm against each other and make sure that they match each other. Each algorithm might have bugs but it is unlikely that they share common ones. Since we have now a good oracle, we can apply this approach that a large number of inputs to increase our coverage.

In the MST case, two popular algorithms are Prim’s and Kruskal’s. Those are 2 different approaches: Prim is built on top of Dijkstra single source shortest path, while Kruskal is built on top of the disjoint-set data structure. By carefully picking the weights of the edges, we can assert different things:

  • if the edge weights are all equals, any spanning tree is minimal. So we can compare the result to a depth-first-search algorithm (which can easily compute a spanning tree).
  • if some edge weights are different, there may be many minimum spanning tree. In this case, we can still assert that weight of the tree is minimum.
  • if all the edge weights are different, then the MST is unique. This fact can be used to precisely pinpoint the differences in solution during testing.

There is a corner case that needs to be checked: if all algorithm are no-op, i.e. they don’t do anything. There solution will always match!

Happy New Year!

posted on Thursday, January 01, 2009 9:03:14 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Tuesday, December 30, 2008

I recently implemented a Disjoint-Set data structure in QuickGraph, which is the main building block of the Kruskal’s minimum spanning tree algorithm. This is a fun data-structure to look at, as it purpose is quite different from the ‘main’ BCL collections. The disjoint-set is useful to partition elements into sets, and defines 2 main operation to that purpose: Find, finds the set an element belongs to (can be used to check if 2 elements are in the same set), Union merges two sets.

The source of the disjoint-set is in the QuickGraph source, if you are curious about the details.

Testing the disjoint-set

There is not much left to TDD when it comes to write data structure. Such data structure are usually described in details in an article, and you ‘just’ have to follow the authors instruction to implement. Nevertheless, unit testing is really critical to ensure that the implementation is correct – but there is a risk of having to re-implement the algorithm to test it.

For the disjoint-set, I used 2 tools developed at RiSE: Code Contracts and… Pex. When implementing data structure from the literature, I usually start ‘dumping’ the code and the contracts. As much as possible, any invariant or property that the author describes should be translated to contracts. It will give more opportunities for Pex to find bugs in my code.

Contracts First

For example, the contracts for the Union method look as follows:

private bool Union(Element left, Element right)
{
    Contract.Requires(left != null);
    Contract.Requires(right != null);
    Contract.Ensures(Contract.Result<bool>() 
        ? Contract.OldValue(this.SetCount) - 1 == this.SetCount
        : Contract.OldValue(this.SetCount) == this.SetCount
        );
    Contract.Ensures(this.FindNoCompression(left) == this.FindNoCompression(right));

The two first requires clauses do the usual null checks. The first ensures clause checks that if Union returns true, a merge has been done and the number of sets has decreased by 1. The last ensures checks that left and right belong to the same set at the end of the union.

Note that so far, I did not have to provide any kind of implementation details. The other methods in the implementation receive the same treatment.

A Parameterized Unit Test

I wrote a single parameterized unit test while writing/debugging the implementation of the forest. It could probably have been re-factored into many smaller tests, but for the sake of laziness, I used a single one.

The parameterized unit test implement a common scenario: add elements to the disjoint-set, then apply a bunch of union operations. Along the way, we can rely on test assertion and code contracts to check the correctness of our implementation.

Let’s start with the test method signature:

[PexMethod]
public void Unions(int elementCount, [PexAssumeNotNull]KeyValuePair<int,int>[] unions) {

The test takes a number of element to add and a sequence of unions to apply. The input data as is needs to be refined to be useful. In that sense, we add assumptions, under the form of calls to PexAssume, to tell Pex ‘how it should shape’ the input data. In this case, we want to ensure that elementCount is positive and relatively small; and that the values in unions are within the [0…elementCount) range.

    PexAssume.IsTrue(0 < elementCount);
    PexSymbolicValue.Minimize(elementCount);
    PexAssume.TrueForAll(
        unions, 
        u => 0 <= u.Key && u.Key < elementCount && 
             0 <= u.Value && u.Value < elementCount
             );

Now that we have some data, we can start writing the first part of the scenario: filling the disjoint-set. To do so, we simply add the integers from [0..elementCount). Along the way, we check that the Contains, ElementCount, SetCount all behave as expected:

    var target = new ForestDisjointSet<int>();
    // fill up with 0..elementCount - 1
    for (int i = 0; i < elementCount; i++)
    {
        target.MakeSet(i);
        Assert.IsTrue(target.Contains(i));
        Assert.AreEqual(i + 1, target.ElementCount);
        Assert.AreEqual(i + 1, target.SetCount);
    }

The second part gets more interesting. Each element of the unions array is a ‘union’ action between 2 elements:

    // apply Union for pairs unions[i], unions[i+1]
    for (int i = 0; i < unions.Length; i++)
    {
        var left = unions[i].Key;
        var right= unions[i].Value;

        var setCount = target.SetCount;
        bool unioned = target.Union(left, right);

There is 2 things we want to assert here: first, then left and right now belong to the same set. Secondly, that the SetCount has been updated correctly: if unioned is true, it should have decreased by one.

        // should be in the same set now
      
Assert.IsTrue(target.AreInSameSet(left, right));
        // if unioned, the count decreased by 1
      
PexAssert.ImpliesIsTrue(unioned, () => setCount - 1 == target.SetCount);
    }
}

From this parameterized unit test, I could work on the implementation, and refine again and again until it had all passing tests (I did not write any other test code). 

Happy Ending

The resulting test suite generated by Pex is summarized in the screenshot below: the number of elements does not really matter, what is interesting is the sequence of unions performed. This test suite achieves 100% coverage of the methods under test :).

image

In fact, the Union method involved some tricky branches to cover, due to some optimization occurring in the disjoint-set. Pex managed to generate unit tests for each of the branches.

image

Thanks to the contracts, the test assertions, and the high code coverage of the generated test suite, I have now a good confidence that my code is properly tested. The End!

(Next time, we will talk about testing minimum spanning tree implementations…)

posted on Tuesday, December 30, 2008 11:41:45 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Thursday, December 18, 2008

My second son Emile de Halleux was born on Monday with 4.204g and lots of hair. He and his mother are enjoying a nice hot fire while the winter storm is slamming Seattle (glad he decided to come out before the icy mess). Happy Holidays!

posted on Thursday, December 18, 2008 4:19:09 PM (Pacific Standard Time, UTC-08:00)  #    Comments [7]
# Thursday, December 04, 2008

I just released a new version of QuickGraph. Among many small improvements here and there, the big news are:

  • .net 2.0 support is back! QuickGraph now builds both for .net 2.0 and .net 3.5 (with extension methods).
  • The Dijkstra algorithm use a Fibonacci Heap under the hood, which is way more efficient. Courtesy of Roy Williams!
  • better serialization support and other bug fixes.

Enjoy, Peli.

posted on Thursday, December 04, 2008 12:28:30 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
# Tuesday, December 02, 2008

Very exciting, I just published my first video on channel9. In the future, I’ll posting more interviews about researchers and projects that happens in our group in Redmond. Expect to get more videos on Pex, Code Contracts, CHESS, etc… over there.

http://channel9.msdn.com/posts/Peli/The-RiSE-of-Research-in-Software-Engineering/

posted on Tuesday, December 02, 2008 12:53:26 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Tuesday, November 25, 2008

We just published a tutorial on the code digger on CodeProject, enjoy…

http://www.codeproject.com/KB/testing/pex.aspx

posted on Tuesday, November 25, 2008 8:13:29 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Friday, November 14, 2008

We will give a session on Pex at the Seattle Code Camp – Saturday 11/15 at 2.45pm. We will show the new Code Digger and how Pex can help you…

posted on Friday, November 14, 2008 1:35:49 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Wednesday, November 05, 2008

Check out the session on Code Contracts and Pex on Channel 9. You will learn about the new cool API to express pre-conditions, post-conditions and invariants in your favorite language – i.e. design by contracts (DbC) for .NET and the new Code Digger experience in Pex, and most importantly how DbC and Pex play well together.

http://channel9.msdn.com/pdc2008/tl51/

image

posted on Wednesday, November 05, 2008 11:10:22 PM (Pacific Standard Time, UTC-08:00)  #    Comments [4]
# Saturday, October 25, 2008

 

I’ll be at the MSR@PDC booth the entire week, don’t forget to say hello and get your demo.

image

posted on Saturday, October 25, 2008 7:29:17 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Tuesday, October 21, 2008

Wonder what we’ve been up to for the last months… We’ve been building a very cool development experience on top of Pex that we call Code Digging. Check out Nikolai’s blog post on what it means to you!       .

image

posted on Tuesday, October 21, 2008 9:49:02 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]