# Tuesday, December 30, 2008

I recently implemented a Disjoint-Set data structure in QuickGraph, which is the main building block of the Kruskal’s minimum spanning tree algorithm. This is a fun data-structure to look at, as it purpose is quite different from the ‘main’ BCL collections. The disjoint-set is useful to partition elements into sets, and defines 2 main operation to that purpose: Find, finds the set an element belongs to (can be used to check if 2 elements are in the same set), Union merges two sets.

The source of the disjoint-set is in the QuickGraph source, if you are curious about the details.

Testing the disjoint-set

There is not much left to TDD when it comes to write data structure. Such data structure are usually described in details in an article, and you ‘just’ have to follow the authors instruction to implement. Nevertheless, unit testing is really critical to ensure that the implementation is correct – but there is a risk of having to re-implement the algorithm to test it.

For the disjoint-set, I used 2 tools developed at RiSE: Code Contracts and… Pex. When implementing data structure from the literature, I usually start ‘dumping’ the code and the contracts. As much as possible, any invariant or property that the author describes should be translated to contracts. It will give more opportunities for Pex to find bugs in my code.

Contracts First

For example, the contracts for the Union method look as follows:

private bool Union(Element left, Element right)
{
    Contract.Requires(left != null);
    Contract.Requires(right != null);
    Contract.Ensures(Contract.Result<bool>() 
        ? Contract.OldValue(this.SetCount) - 1 == this.SetCount
        : Contract.OldValue(this.SetCount) == this.SetCount
        );
    Contract.Ensures(this.FindNoCompression(left) == this.FindNoCompression(right));

The two first requires clauses do the usual null checks. The first ensures clause checks that if Union returns true, a merge has been done and the number of sets has decreased by 1. The last ensures checks that left and right belong to the same set at the end of the union.

Note that so far, I did not have to provide any kind of implementation details. The other methods in the implementation receive the same treatment.

A Parameterized Unit Test

I wrote a single parameterized unit test while writing/debugging the implementation of the forest. It could probably have been re-factored into many smaller tests, but for the sake of laziness, I used a single one.

The parameterized unit test implement a common scenario: add elements to the disjoint-set, then apply a bunch of union operations. Along the way, we can rely on test assertion and code contracts to check the correctness of our implementation.

Let’s start with the test method signature:

[PexMethod]
public void Unions(int elementCount, [PexAssumeNotNull]KeyValuePair<int,int>[] unions) {

The test takes a number of element to add and a sequence of unions to apply. The input data as is needs to be refined to be useful. In that sense, we add assumptions, under the form of calls to PexAssume, to tell Pex ‘how it should shape’ the input data. In this case, we want to ensure that elementCount is positive and relatively small; and that the values in unions are within the [0…elementCount) range.

    PexAssume.IsTrue(0 < elementCount);
    PexSymbolicValue.Minimize(elementCount);
    PexAssume.TrueForAll(
        unions, 
        u => 0 <= u.Key && u.Key < elementCount && 
             0 <= u.Value && u.Value < elementCount
             );

Now that we have some data, we can start writing the first part of the scenario: filling the disjoint-set. To do so, we simply add the integers from [0..elementCount). Along the way, we check that the Contains, ElementCount, SetCount all behave as expected:

    var target = new ForestDisjointSet<int>();
    // fill up with 0..elementCount - 1
    for (int i = 0; i < elementCount; i++)
    {
        target.MakeSet(i);
        Assert.IsTrue(target.Contains(i));
        Assert.AreEqual(i + 1, target.ElementCount);
        Assert.AreEqual(i + 1, target.SetCount);
    }

The second part gets more interesting. Each element of the unions array is a ‘union’ action between 2 elements:

    // apply Union for pairs unions[i], unions[i+1]
    for (int i = 0; i < unions.Length; i++)
    {
        var left = unions[i].Key;
        var right= unions[i].Value;

        var setCount = target.SetCount;
        bool unioned = target.Union(left, right);

There is 2 things we want to assert here: first, then left and right now belong to the same set. Secondly, that the SetCount has been updated correctly: if unioned is true, it should have decreased by one.

        // should be in the same set now
      
Assert.IsTrue(target.AreInSameSet(left, right));
        // if unioned, the count decreased by 1
      
PexAssert.ImpliesIsTrue(unioned, () => setCount - 1 == target.SetCount);
    }
}

From this parameterized unit test, I could work on the implementation, and refine again and again until it had all passing tests (I did not write any other test code). 

Happy Ending

The resulting test suite generated by Pex is summarized in the screenshot below: the number of elements does not really matter, what is interesting is the sequence of unions performed. This test suite achieves 100% coverage of the methods under test :).

image

In fact, the Union method involved some tricky branches to cover, due to some optimization occurring in the disjoint-set. Pex managed to generate unit tests for each of the branches.

image

Thanks to the contracts, the test assertions, and the high code coverage of the generated test suite, I have now a good confidence that my code is properly tested. The End!

(Next time, we will talk about testing minimum spanning tree implementations…)

posted on Tuesday, December 30, 2008 11:41:45 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Thursday, December 18, 2008

My second son Emile de Halleux was born on Monday with 4.204g and lots of hair. He and his mother are enjoying a nice hot fire while the winter storm is slamming Seattle (glad he decided to come out before the icy mess). Happy Holidays!

posted on Thursday, December 18, 2008 4:19:09 PM (Pacific Standard Time, UTC-08:00)  #    Comments [7]
# Thursday, December 04, 2008

I just released a new version of QuickGraph. Among many small improvements here and there, the big news are:

  • .net 2.0 support is back! QuickGraph now builds both for .net 2.0 and .net 3.5 (with extension methods).
  • The Dijkstra algorithm use a Fibonacci Heap under the hood, which is way more efficient. Courtesy of Roy Williams!
  • better serialization support and other bug fixes.

Enjoy, Peli.

posted on Thursday, December 04, 2008 12:28:30 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
# Tuesday, December 02, 2008

Very exciting, I just published my first video on channel9. In the future, I’ll posting more interviews about researchers and projects that happens in our group in Redmond. Expect to get more videos on Pex, Code Contracts, CHESS, etc… over there.

http://channel9.msdn.com/posts/Peli/The-RiSE-of-Research-in-Software-Engineering/

posted on Tuesday, December 02, 2008 12:53:26 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]