# Tuesday, December 30, 2008

I recently implemented a Disjoint-Set data structure in QuickGraph, which is the main building block of the Kruskal’s minimum spanning tree algorithm. This is a fun data-structure to look at, as it purpose is quite different from the ‘main’ BCL collections. The disjoint-set is useful to partition elements into sets, and defines 2 main operation to that purpose: Find, finds the set an element belongs to (can be used to check if 2 elements are in the same set), Union merges two sets.

The source of the disjoint-set is in the QuickGraph source, if you are curious about the details.

Testing the disjoint-set

There is not much left to TDD when it comes to write data structure. Such data structure are usually described in details in an article, and you ‘just’ have to follow the authors instruction to implement. Nevertheless, unit testing is really critical to ensure that the implementation is correct – but there is a risk of having to re-implement the algorithm to test it.

For the disjoint-set, I used 2 tools developed at RiSE: Code Contracts and… Pex. When implementing data structure from the literature, I usually start ‘dumping’ the code and the contracts. As much as possible, any invariant or property that the author describes should be translated to contracts. It will give more opportunities for Pex to find bugs in my code.

Contracts First

For example, the contracts for the Union method look as follows:

private bool Union(Element left, Element right)
{
    Contract.Requires(left != null);
    Contract.Requires(right != null);
    Contract.Ensures(Contract.Result<bool>() 
        ? Contract.OldValue(this.SetCount) - 1 == this.SetCount
        : Contract.OldValue(this.SetCount) == this.SetCount
        );
    Contract.Ensures(this.FindNoCompression(left) == this.FindNoCompression(right));

The two first requires clauses do the usual null checks. The first ensures clause checks that if Union returns true, a merge has been done and the number of sets has decreased by 1. The last ensures checks that left and right belong to the same set at the end of the union.

Note that so far, I did not have to provide any kind of implementation details. The other methods in the implementation receive the same treatment.

A Parameterized Unit Test

I wrote a single parameterized unit test while writing/debugging the implementation of the forest. It could probably have been re-factored into many smaller tests, but for the sake of laziness, I used a single one.

The parameterized unit test implement a common scenario: add elements to the disjoint-set, then apply a bunch of union operations. Along the way, we can rely on test assertion and code contracts to check the correctness of our implementation.

Let’s start with the test method signature:

[PexMethod]
public void Unions(int elementCount, [PexAssumeNotNull]KeyValuePair<int,int>[] unions) {

The test takes a number of element to add and a sequence of unions to apply. The input data as is needs to be refined to be useful. In that sense, we add assumptions, under the form of calls to PexAssume, to tell Pex ‘how it should shape’ the input data. In this case, we want to ensure that elementCount is positive and relatively small; and that the values in unions are within the [0…elementCount) range.

    PexAssume.IsTrue(0 < elementCount);
    PexSymbolicValue.Minimize(elementCount);
    PexAssume.TrueForAll(
        unions, 
        u => 0 <= u.Key && u.Key < elementCount && 
             0 <= u.Value && u.Value < elementCount
             );

Now that we have some data, we can start writing the first part of the scenario: filling the disjoint-set. To do so, we simply add the integers from [0..elementCount). Along the way, we check that the Contains, ElementCount, SetCount all behave as expected:

    var target = new ForestDisjointSet<int>();
    // fill up with 0..elementCount - 1
    for (int i = 0; i < elementCount; i++)
    {
        target.MakeSet(i);
        Assert.IsTrue(target.Contains(i));
        Assert.AreEqual(i + 1, target.ElementCount);
        Assert.AreEqual(i + 1, target.SetCount);
    }

The second part gets more interesting. Each element of the unions array is a ‘union’ action between 2 elements:

    // apply Union for pairs unions[i], unions[i+1]
    for (int i = 0; i < unions.Length; i++)
    {
        var left = unions[i].Key;
        var right= unions[i].Value;

        var setCount = target.SetCount;
        bool unioned = target.Union(left, right);

There is 2 things we want to assert here: first, then left and right now belong to the same set. Secondly, that the SetCount has been updated correctly: if unioned is true, it should have decreased by one.

        // should be in the same set now
      
Assert.IsTrue(target.AreInSameSet(left, right));
        // if unioned, the count decreased by 1
      
PexAssert.ImpliesIsTrue(unioned, () => setCount - 1 == target.SetCount);
    }
}

From this parameterized unit test, I could work on the implementation, and refine again and again until it had all passing tests (I did not write any other test code). 

Happy Ending

The resulting test suite generated by Pex is summarized in the screenshot below: the number of elements does not really matter, what is interesting is the sequence of unions performed. This test suite achieves 100% coverage of the methods under test :).

image

In fact, the Union method involved some tricky branches to cover, due to some optimization occurring in the disjoint-set. Pex managed to generate unit tests for each of the branches.

image

Thanks to the contracts, the test assertions, and the high code coverage of the generated test suite, I have now a good confidence that my code is properly tested. The End!

(Next time, we will talk about testing minimum spanning tree implementations…)

posted on Tuesday, December 30, 2008 11:41:45 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Thursday, December 18, 2008

My second son Emile de Halleux was born on Monday with 4.204g and lots of hair. He and his mother are enjoying a nice hot fire while the winter storm is slamming Seattle (glad he decided to come out before the icy mess). Happy Holidays!

posted on Thursday, December 18, 2008 4:19:09 PM (Pacific Standard Time, UTC-08:00)  #    Comments [7]
# Thursday, December 04, 2008

I just released a new version of QuickGraph. Among many small improvements here and there, the big news are:

  • .net 2.0 support is back! QuickGraph now builds both for .net 2.0 and .net 3.5 (with extension methods).
  • The Dijkstra algorithm use a Fibonacci Heap under the hood, which is way more efficient. Courtesy of Roy Williams!
  • better serialization support and other bug fixes.

Enjoy, Peli.

posted on Thursday, December 04, 2008 12:28:30 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
# Tuesday, December 02, 2008

Very exciting, I just published my first video on channel9. In the future, I’ll posting more interviews about researchers and projects that happens in our group in Redmond. Expect to get more videos on Pex, Code Contracts, CHESS, etc… over there.

http://channel9.msdn.com/posts/Peli/The-RiSE-of-Research-in-Software-Engineering/

posted on Tuesday, December 02, 2008 12:53:26 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Tuesday, November 25, 2008

We just published a tutorial on the code digger on CodeProject, enjoy…

http://www.codeproject.com/KB/testing/pex.aspx

posted on Tuesday, November 25, 2008 8:13:29 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Friday, November 14, 2008

We will give a session on Pex at the Seattle Code Camp – Saturday 11/15 at 2.45pm. We will show the new Code Digger and how Pex can help you…

posted on Friday, November 14, 2008 1:35:49 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Wednesday, November 05, 2008

Check out the session on Code Contracts and Pex on Channel 9. You will learn about the new cool API to express pre-conditions, post-conditions and invariants in your favorite language – i.e. design by contracts (DbC) for .NET and the new Code Digger experience in Pex, and most importantly how DbC and Pex play well together.

http://channel9.msdn.com/pdc2008/tl51/

image

posted on Wednesday, November 05, 2008 11:10:22 PM (Pacific Standard Time, UTC-08:00)  #    Comments [4]
# Saturday, October 25, 2008

 

I’ll be at the MSR@PDC booth the entire week, don’t forget to say hello and get your demo.

image

posted on Saturday, October 25, 2008 7:29:17 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]
# Tuesday, October 21, 2008

Wonder what we’ve been up to for the last months… We’ve been building a very cool development experience on top of Pex that we call Code Digging. Check out Nikolai’s blog post on what it means to you!       .

image

posted on Tuesday, October 21, 2008 9:49:02 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]

Have you ever written code that directly used the .NET File APIs? We probably all did although we knew it would make the code less testable and dependent on the file system state. As bad as it sounds, it really requires a lot of discipline and work to avoid this: one would need to create an abstraction layer over the file system, which is not a short task (think long/tedious).

// in how many ways can this break?
public static void CleanDirectory(string path)
{
    if (Directory.Exists(path))
        Directory.Delete(path, true);
    Directory.CreateDirectory(path);
}

Abstraction

Fortunately, there always someone else who got motivated at some point. Ade Miller digged an abstraction of the File System, the IFileSystem interface, that Brad Wilson had written for the CodePlex client project. Very nice since it provides a solid foundation for cleanly abstracting from the File System, and thus increase the testability of our code.

// a little better, testable code at least
public static void
CleanDirectory(IFileSystem fs, string path) { if (fs.DirectoryExists(path)) fs.DeleteDirectory(path, false); fs.CreateDirectory(path); }

Mocking

So with this interface we can write code that we’ll be able to test in isolation from the physical file system. That’s great but there is still a lot of work on the should of the developer: the developer will have write intricate scenarios involving mocks to simulate the different possible configurations of the file system. No matter which mock framework (Moq, Rhino, Isolator, …), he’ll be using, (1) it’s going to be painful, (2) he’ll miss cases. It’s probably easy to write a single “happy path”, but especially with the file system there are quite some realistic “unhappy paths”.

This test case uses Moq to create the scenario where there is a directory already. Although Moq has a very slick API to set expectations, it is still a lot of work to write this basic scenario. (And what exactly is the meaning of “Expect”, the delegate or expression inside, “Returns” and “Callback”?)

[TestMethod]
public void DeletesAndCreateNewDirectory()
{
    var fs = new Mock<IFileSystem>();
    string path = "foo";

    fs.Expect(f => f.DirectoryExists(path)).Returns(true);
    fs.Expect(f => f.DeleteDirectory(path, false)).Callback( () => Console.WriteLine("deleted"));
    fs.Expect(f => f.CreateDirectory(path)).Callback(() => Console.WriteLine("created"));

    DirectoryExtensions.CleanDirectory(fs.Object, path);
}

Modeling

We had our intern, Soonho Kong, work on a Parameterized Model of the File System, built on top of the IFileSystem interface (yes that same interface Brad Wilson published on CodePlex). We say that the model is parameterized because it uses the Pex choices API to create arbitrary initial File System states; Pex “chooses” each such state (actually, Pex carefully computes the state using a constraint solver) to trigger different code paths in the code. You can think of each choice as a new parameter to the test. Or to put this with an example: if your code checks that the file “foo.txt” exists, then the parameterized model would choose a file system state that would contain a “foo.txt” file (or not, in another state, to cover both branches of the program).

So what does it mean for you? Well, the way you write tests that involve the file system changes radically. You simply need to pass the file system model to your implementation. The model is an under-approximation of the real file system (which means that we didn’t model every single nastiness that can occur when the moon is full), but it definitely captures more practically relevant corner cases than we (humans) usually think about. Let’s see this in the following test:

[PexMethod]
public void CleanDirectory()
{
    var fs = new PFileSystem();
    string path = @"\foo";
    try
    {
        DirectoryExtensions.CleanDirectory(fs, path);

        // assert: the directory exists and is empty
        Assert.IsTrue(fs.DirectoryExists(path));
        Assert.AreEqual(0, fs.GetFiles(path).Length);
    }
    finally
    {
        fs.Dir();
    }
}

When we run Pex, we get 7 generated tests. In fact, Pex finds an interesting bug that occurs when a file with the name of the directory to clean already exists. In the Pex Exploration Results window, you can see a ‘dir’-like output of the file system model associated with a particular test case (the fs.Dir() method call outputs that text to the console which Pex captures).

image

This bug is the kind of corner-case that makes testing the file system so fun/hard. Thanks to the parameterized model (and Soonho), we got it for free. Note also that the assertion in our test is pretty powerful since it must be true for any configuration of the file system (it almost smells like a functional specification to me):

// assert: the directory exists and is empty
Assert.IsTrue(fs.DirectoryExists(path));
Assert.AreEqual(0, fs.GetFiles(path).Length);

Happy modeling!

The full source of PFileSystem will be available in the next version of Pex (0.8)..

posted on Tuesday, October 21, 2008 12:15:09 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [6]