# Wednesday, January 14, 2009

Today, we released the 0.9.40105.0 version of Pex. The major highlights of this release are: Pre-Release license for Visual Studio 2008, better test framework integration and partial support for Visual Basic.NET/F# (read the full release notes).

CHESS, another project from RiSE, also released it’s first public download today. CHESS finds and reproduces heisenbugs in concurrent programs. CHESS uses the Pex code instrumentation technology to monitor and control the .Net threading APIs, and one day we hope to combine Pex and CHESS…

Enjoy and don’t forget to give us your feedback through the MSDN forums.

posted on Wednesday, January 14, 2009 12:21:29 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Saturday, January 10, 2009

Stubs is a framework for generating stubs (that we announced earlier with Pex). The framework generates stubs (not mocks) that are

  • lightweight, i.e. relying on delegates only,
  • strongly typed, i.e. no magic strings – refactoring friendly,
  • source code generated, i.e. no dynamic code or expression trees,
  • great debugging experience, i.e. break and step through stubs,
  • minimalistic API (there is almost none),
  • friendly for Pex!

More details about Stubs are also available in the stubs reference document.

Quick Start

Let’s start from a simple interface IFileSystem for which we want to write stubs:

public interface IFileSystem
    void WriteAllText(string fileName, string content);
    string ReadAllText(string fileName);

Stubs relies solely on delegates to attach behaviors, and leverages the lambda syntax from C# 3.0 to accomplish pretty much anything:

// SFileSystem was generated by Stubs and stubs IFileSystem
var stub = new SFileSystem();

// always returns “...”
stub.ReadAllText = (me, file) => "...";

// expectations: checks file = “foo.txt”
stub.ReadAllText = (me, file) =>
        Assert.AreEqual("foo.txt", file);
        return "...";

// storing side effects in closures: written saves content
string written = null;
stub.WriteAllText = (me, file, content) => written = content;

// hey, we can do whatever we want!
stub.ReadAllText = (me, file) =>
        if (file == null) throw new ArgumentNullException();
        return "...";
// downcast to the interface to use it
IFileSystem fs = stubs;

Anything is possible… as long as C# (or your favorite language) allows it.

Anatomy of a Stub

Each stubbed method has an associated field delegate that can be set freely (e.g. WriteAllText and ReadAllText). If this delegate field is set, it will used when the method is called; otherwize a default action occurs. Let’s see this with a simplified stub of the IFileSystem interface, SFileSystem, which shows how the IFileSystem.WriteAllText is implemented:

class SFileSystem 
    : StubBase<SFileSystem>
    , IFileSystem
    public Action<SFileSystem, string, string> WriteAllText; // attach here
    void IFileSystem.WriteAllText(string fileName, string content)
        var stub = this.WriteAllText;
        if (stub != null)
            stub(this, fileName, content); // your code executed here

The actual generated code may look more complicated because it contains custom attributes for debugging, comments and globally qualified types to avoid name clashing:


Code generation: How does it work?

Stubs is a single file generator that pre-generates stubs to source code. Stubs also monitors the build and regenerates the source code when a change occurs.


The stub generation is configured through an XML file (.stubx file) that contains which code and how it should be generated. The generated code is saved in a ‘Designer’ file, similarly to other code generation tools (type dataset etc…).


Great debugging experience

A cool side effect of simply using delegates: you can step through and debug your stubs! This is usually not the case with mock framework using dynamic code. Below, you can see that one can set breakpoints in the body of a stub and debug as usual.


Where do I get it?

The Stubs framework comes with Pex but you can use it in any unit testing activity. It provides a simple lightweight alternative to define stub for testing. Pex can be downloaded from http://research.microsoft.com/pex .

posted on Saturday, January 10, 2009 7:41:32 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
# Thursday, January 08, 2009

Following the implementation of the disjointset, another fun algorithm to implement was the offline least common ancestor problem. Tarjan proposed a algorithm based on the disjoint-set and a DFS traversal. Given that I now have all those ingredients in QuickGraph, it was just a matter of hooking together the implementation with the right events. The beef of the implementation looks as follows (and looks quite elegant to me):

var gpair = GraphExtensions.ToAdjacencyGraph(this.pairs); // for fast lookup
var disjointSet = new ForestDisjointSet();
var vancestors = new Dictionary();
var dfs = new DepthFirstSearchAlgorithm(this, this.VisitedGraph, 
new DictionaryGraphColor>(this.VisitedGraph.VertexCount));

dfs.InitializeVertex += (sender, args) => disjointSet.MakeSet(args.Vertex);
dfs.DiscoverVertex += (sender, args) => vancestors[args.Vertex] = args.Vertex;
dfs.TreeEdge += (sender, args) =>
        var edge = args.Edge;
        disjointSet.Union(edge.Source, edge.Target);
        vancestors[disjointSet.FindSet(edge.Source)] = edge.Source;
dfs.FinishVertex += (sender, args) =>
        foreach (var e in gpair.OutEdges(args.Vertex))
            if (dfs.VertexColors[e.Target] == GraphColor.Black)
                this.ancestors[EdgeExtensions.ToVertexPair(e)] = vancestors[disjointSet.FindSet(e.Target)];

// go!


posted on Thursday, January 08, 2009 12:32:54 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Sunday, January 04, 2009

The k-shortest paths computes the k first shortest path between two vertices in a directed graph. If you put this in the context of route planning, it gives you k alternative routes in case the shortest path is blocked by snow :) While the single source shortest path is well-known and implemented in many languages, there are not many implementations available for this problem, although it has been extensively studied. After looking around, I stumbled on a nice article comparing various approaches. The authors pointed out an algorithm from 1959 from Hoffman and Pavley that solved this problem (there are actually many others). This algorithm looked like a good fit:

  • it requires a single call to a single-source shortest path algorithm. Other approaches will require as much as kn calls to a shortest path algorithm.
  • it sounded simple and did not require new specialized data structures.

I want to try it!

The algorithm is available in QuickGraph 3.1.40104.00 and up. You can take a look at it at http://www.codeplex.com/quickgraph/SourceControl/changeset/view/29858#377982. To use it on a BidirectionalGraph,

IBidirectionalGraph<TVertex, TEdge> g = …;
foreach(IEnumerable<TEdge> path in g.RankedShortestPathHoffmanPavley(weights, source, goal, 4))

A glimpse at the Hoffmann-Pavley algorithm

The algorithm works in 2 phases.

On the first phase, we build a minimum ‘successor’ tree towards the goal vertex. This tree can be used to build a shortest path from any (reachable) vertex to the goal vertex. To build this tree, we can simply apply the Dijkstra shortest path algorithm on the reversed graph. This can be done in a couple lines with QuickGraph.

As for many other k-shortest path algorithm, phase works by building deviation path and picking the best one. In the case of the Hoffman-Pavley algorithm, it works as follows: pick the latest shortest path, for each vertex of this path, build a deviation path (more later) for each out-edge and add it to a priority queue. Then start again:

var queue = new PriorityQueue<Path>();
while(queue.Count > 0) {
    var path = queue.Dequeue();
    foreach(var vertex in path)
        foreach(var edge in graph.OutEdges(vertex))
             queue.Enqueue(CreateDeviation(path, vertex, edge));

A deviation path is composed of three parts:

  1. the initial part of the ‘seeding’ path, i.e. the edges before the deviation edge,
  2. the deviation edge
  3. the remaining shortest path to the goal. When we build the deviation path, we also compute it’s weight. A nice property of the deviation paths is that they can be ‘built’ when needed. This saves a lot of space and computation as most deviation paths will probably not end up in the winning set of paths – instead of storing a full path, we store a path index and an edge index.

That’s it! The details contain more code to deal with self-edges and loops but the main idea is there. This is definitely a very elegant algorithm!

What about testing!

Algorithm authors usually illustrate their approach with an example. This is a good way to get started on a small graph example and ensure that the algorithm works as the original author expected. This is the kind of unit test I get started with.

The next step is to apply the algorithm to a large number of graph instances. Unfortunately, I do not have any other k-shortest path algorithm, so the oracle is harder to build here. Nevertheless, the result of the algorithm, i.e. the shortest path collection, has a couple of properties that should always be true:

  • the algorithm produces loopless paths,
  • path k is lower or equal to path k + 1.

The problem with this test is that it does not guarantee that some shortest path have been missed. At this point, I’m a bit puzzled on how to test that.

posted on Sunday, January 04, 2009 11:05:14 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]
# Thursday, January 01, 2009

When you implement an algorithm that computes an optimal solution (i.e. minimum/maximum with respect to a cost function), how do you test that the solution is actually optimal?

This is the kind of question I face when implementing algorithms in QuickGraph. For example, I was recently looking at the minimum spanning tree of a graph (MST)? While checking that the result tree is a spanning tree is easy, checking that it is minimum is not obvious: nobody has written Assert.IsMinimum yet :). Here are a couple techniques that I found useful along the way:

Input-Output table

The most obvious approach is to pre-compute the result for a number of problems and assert the solution of the algorithm matches. In this MST case, use a small set of graphs for which the MST is well known, and check that the computed MST has the correct weight. This approach will take you only so far and requires a lot of manual work since you need to solve the problem (or find a known solution) for a number of representative cases.

Solution Perturbation

If the algorithm computes a solution that is minimal with respect to a cost function, one can try to perturbate the solution to see if there’s a smaller one. If so, it clearly violates the fact that the solution should be minimal, thus you just found a bug. In the case of MST, this would mean randomly picking edges that are not in the minimum spanning tree, swapping them and evaluate if the result tree is smaller.

This is kind of approach is actually used in optimization where the search space might have local minima (see simulated annealing).

Multiple Implementations

A nice thing about graph problems is that there are usually many (vastly) different ways to solve them. If multiple implementations are available, we can simply compare the result of each algorithm against each other and make sure that they match each other. Each algorithm might have bugs but it is unlikely that they share common ones. Since we have now a good oracle, we can apply this approach that a large number of inputs to increase our coverage.

In the MST case, two popular algorithms are Prim’s and Kruskal’s. Those are 2 different approaches: Prim is built on top of Dijkstra single source shortest path, while Kruskal is built on top of the disjoint-set data structure. By carefully picking the weights of the edges, we can assert different things:

  • if the edge weights are all equals, any spanning tree is minimal. So we can compare the result to a depth-first-search algorithm (which can easily compute a spanning tree).
  • if some edge weights are different, there may be many minimum spanning tree. In this case, we can still assert that weight of the tree is minimum.
  • if all the edge weights are different, then the MST is unique. This fact can be used to precisely pinpoint the differences in solution during testing.

There is a corner case that needs to be checked: if all algorithm are no-op, i.e. they don’t do anything. There solution will always match!

Happy New Year!

posted on Thursday, January 01, 2009 9:03:14 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Tuesday, December 30, 2008

I recently implemented a Disjoint-Set data structure in QuickGraph, which is the main building block of the Kruskal’s minimum spanning tree algorithm. This is a fun data-structure to look at, as it purpose is quite different from the ‘main’ BCL collections. The disjoint-set is useful to partition elements into sets, and defines 2 main operation to that purpose: Find, finds the set an element belongs to (can be used to check if 2 elements are in the same set), Union merges two sets.

The source of the disjoint-set is in the QuickGraph source, if you are curious about the details.

Testing the disjoint-set

There is not much left to TDD when it comes to write data structure. Such data structure are usually described in details in an article, and you ‘just’ have to follow the authors instruction to implement. Nevertheless, unit testing is really critical to ensure that the implementation is correct – but there is a risk of having to re-implement the algorithm to test it.

For the disjoint-set, I used 2 tools developed at RiSE: Code Contracts and… Pex. When implementing data structure from the literature, I usually start ‘dumping’ the code and the contracts. As much as possible, any invariant or property that the author describes should be translated to contracts. It will give more opportunities for Pex to find bugs in my code.

Contracts First

For example, the contracts for the Union method look as follows:

private bool Union(Element left, Element right)
    Contract.Requires(left != null);
    Contract.Requires(right != null);
        ? Contract.OldValue(this.SetCount) - 1 == this.SetCount
        : Contract.OldValue(this.SetCount) == this.SetCount
    Contract.Ensures(this.FindNoCompression(left) == this.FindNoCompression(right));

The two first requires clauses do the usual null checks. The first ensures clause checks that if Union returns true, a merge has been done and the number of sets has decreased by 1. The last ensures checks that left and right belong to the same set at the end of the union.

Note that so far, I did not have to provide any kind of implementation details. The other methods in the implementation receive the same treatment.

A Parameterized Unit Test

I wrote a single parameterized unit test while writing/debugging the implementation of the forest. It could probably have been re-factored into many smaller tests, but for the sake of laziness, I used a single one.

The parameterized unit test implement a common scenario: add elements to the disjoint-set, then apply a bunch of union operations. Along the way, we can rely on test assertion and code contracts to check the correctness of our implementation.

Let’s start with the test method signature:

public void Unions(int elementCount, [PexAssumeNotNull]KeyValuePair<int,int>[] unions) {

The test takes a number of element to add and a sequence of unions to apply. The input data as is needs to be refined to be useful. In that sense, we add assumptions, under the form of calls to PexAssume, to tell Pex ‘how it should shape’ the input data. In this case, we want to ensure that elementCount is positive and relatively small; and that the values in unions are within the [0…elementCount) range.

    PexAssume.IsTrue(0 < elementCount);
        u => 0 <= u.Key && u.Key < elementCount && 
             0 <= u.Value && u.Value < elementCount

Now that we have some data, we can start writing the first part of the scenario: filling the disjoint-set. To do so, we simply add the integers from [0..elementCount). Along the way, we check that the Contains, ElementCount, SetCount all behave as expected:

    var target = new ForestDisjointSet<int>();
    // fill up with 0..elementCount - 1
    for (int i = 0; i < elementCount; i++)
        Assert.AreEqual(i + 1, target.ElementCount);
        Assert.AreEqual(i + 1, target.SetCount);

The second part gets more interesting. Each element of the unions array is a ‘union’ action between 2 elements:

    // apply Union for pairs unions[i], unions[i+1]
    for (int i = 0; i < unions.Length; i++)
        var left = unions[i].Key;
        var right= unions[i].Value;

        var setCount = target.SetCount;
        bool unioned = target.Union(left, right);

There is 2 things we want to assert here: first, then left and right now belong to the same set. Secondly, that the SetCount has been updated correctly: if unioned is true, it should have decreased by one.

        // should be in the same set now
Assert.IsTrue(target.AreInSameSet(left, right));
        // if unioned, the count decreased by 1
PexAssert.ImpliesIsTrue(unioned, () => setCount - 1 == target.SetCount);

From this parameterized unit test, I could work on the implementation, and refine again and again until it had all passing tests (I did not write any other test code). 

Happy Ending

The resulting test suite generated by Pex is summarized in the screenshot below: the number of elements does not really matter, what is interesting is the sequence of unions performed. This test suite achieves 100% coverage of the methods under test :).


In fact, the Union method involved some tricky branches to cover, due to some optimization occurring in the disjoint-set. Pex managed to generate unit tests for each of the branches.


Thanks to the contracts, the test assertions, and the high code coverage of the generated test suite, I have now a good confidence that my code is properly tested. The End!

(Next time, we will talk about testing minimum spanning tree implementations…)

posted on Tuesday, December 30, 2008 11:41:45 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Thursday, December 18, 2008

My second son Emile de Halleux was born on Monday with 4.204g and lots of hair. He and his mother are enjoying a nice hot fire while the winter storm is slamming Seattle (glad he decided to come out before the icy mess). Happy Holidays!

posted on Thursday, December 18, 2008 4:19:09 PM (Pacific Standard Time, UTC-08:00)  #    Comments [7]
# Thursday, December 04, 2008

I just released a new version of QuickGraph. Among many small improvements here and there, the big news are:

  • .net 2.0 support is back! QuickGraph now builds both for .net 2.0 and .net 3.5 (with extension methods).
  • The Dijkstra algorithm use a Fibonacci Heap under the hood, which is way more efficient. Courtesy of Roy Williams!
  • better serialization support and other bug fixes.

Enjoy, Peli.

posted on Thursday, December 04, 2008 12:28:30 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
# Tuesday, December 02, 2008

Very exciting, I just published my first video on channel9. In the future, I’ll posting more interviews about researchers and projects that happens in our group in Redmond. Expect to get more videos on Pex, Code Contracts, CHESS, etc… over there.


posted on Tuesday, December 02, 2008 12:53:26 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Tuesday, November 25, 2008

We just published a tutorial on the code digger on CodeProject, enjoy…


posted on Tuesday, November 25, 2008 8:13:29 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]