# Thursday, January 08, 2009

Following the implementation of the disjointset, another fun algorithm to implement was the offline least common ancestor problem. Tarjan proposed a algorithm based on the disjoint-set and a DFS traversal. Given that I now have all those ingredients in QuickGraph, it was just a matter of hooking together the implementation with the right events. The beef of the implementation looks as follows (and looks quite elegant to me):

var gpair = GraphExtensions.ToAdjacencyGraph(this.pairs); // for fast lookup
var disjointSet = new ForestDisjointSet();
var vancestors = new Dictionary();
var dfs = new DepthFirstSearchAlgorithm(this, this.VisitedGraph, 
new DictionaryGraphColor>(this.VisitedGraph.VertexCount));

dfs.InitializeVertex += (sender, args) => disjointSet.MakeSet(args.Vertex);
dfs.DiscoverVertex += (sender, args) => vancestors[args.Vertex] = args.Vertex;
dfs.TreeEdge += (sender, args) =>
    {
        var edge = args.Edge;
        disjointSet.Union(edge.Source, edge.Target);
        vancestors[disjointSet.FindSet(edge.Source)] = edge.Source;
    };
dfs.FinishVertex += (sender, args) =>
    {
        foreach (var e in gpair.OutEdges(args.Vertex))
            if (dfs.VertexColors[e.Target] == GraphColor.Black)
                this.ancestors[EdgeExtensions.ToVertexPair(e)] = vancestors[disjointSet.FindSet(e.Target)];
    };

// go!
dfs.Compute(root);

Enjoy!

posted on Thursday, January 08, 2009 12:32:54 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Sunday, January 04, 2009

The k-shortest paths computes the k first shortest path between two vertices in a directed graph. If you put this in the context of route planning, it gives you k alternative routes in case the shortest path is blocked by snow :) While the single source shortest path is well-known and implemented in many languages, there are not many implementations available for this problem, although it has been extensively studied. After looking around, I stumbled on a nice article comparing various approaches. The authors pointed out an algorithm from 1959 from Hoffman and Pavley that solved this problem (there are actually many others). This algorithm looked like a good fit:

  • it requires a single call to a single-source shortest path algorithm. Other approaches will require as much as kn calls to a shortest path algorithm.
  • it sounded simple and did not require new specialized data structures.

I want to try it!

The algorithm is available in QuickGraph 3.1.40104.00 and up. You can take a look at it at http://www.codeplex.com/quickgraph/SourceControl/changeset/view/29858#377982. To use it on a BidirectionalGraph,

IBidirectionalGraph<TVertex, TEdge> g = …;
foreach(IEnumerable<TEdge> path in g.RankedShortestPathHoffmanPavley(weights, source, goal, 4))
     …

A glimpse at the Hoffmann-Pavley algorithm

The algorithm works in 2 phases.

On the first phase, we build a minimum ‘successor’ tree towards the goal vertex. This tree can be used to build a shortest path from any (reachable) vertex to the goal vertex. To build this tree, we can simply apply the Dijkstra shortest path algorithm on the reversed graph. This can be done in a couple lines with QuickGraph.

As for many other k-shortest path algorithm, phase works by building deviation path and picking the best one. In the case of the Hoffman-Pavley algorithm, it works as follows: pick the latest shortest path, for each vertex of this path, build a deviation path (more later) for each out-edge and add it to a priority queue. Then start again:

var queue = new PriorityQueue<Path>();
queue.Enqueue(shortestpath);
while(queue.Count > 0) {
    var path = queue.Dequeue();
    foreach(var vertex in path)
        foreach(var edge in graph.OutEdges(vertex))
             queue.Enqueue(CreateDeviation(path, vertex, edge));
}

A deviation path is composed of three parts:

  1. the initial part of the ‘seeding’ path, i.e. the edges before the deviation edge,
  2. the deviation edge
  3. the remaining shortest path to the goal. When we build the deviation path, we also compute it’s weight. A nice property of the deviation paths is that they can be ‘built’ when needed. This saves a lot of space and computation as most deviation paths will probably not end up in the winning set of paths – instead of storing a full path, we store a path index and an edge index.

That’s it! The details contain more code to deal with self-edges and loops but the main idea is there. This is definitely a very elegant algorithm!

What about testing!

Algorithm authors usually illustrate their approach with an example. This is a good way to get started on a small graph example and ensure that the algorithm works as the original author expected. This is the kind of unit test I get started with.

The next step is to apply the algorithm to a large number of graph instances. Unfortunately, I do not have any other k-shortest path algorithm, so the oracle is harder to build here. Nevertheless, the result of the algorithm, i.e. the shortest path collection, has a couple of properties that should always be true:

  • the algorithm produces loopless paths,
  • path k is lower or equal to path k + 1.

The problem with this test is that it does not guarantee that some shortest path have been missed. At this point, I’m a bit puzzled on how to test that.

posted on Sunday, January 04, 2009 11:05:14 PM (Pacific Standard Time, UTC-08:00)  #    Comments [1]
# Thursday, January 01, 2009

When you implement an algorithm that computes an optimal solution (i.e. minimum/maximum with respect to a cost function), how do you test that the solution is actually optimal?

This is the kind of question I face when implementing algorithms in QuickGraph. For example, I was recently looking at the minimum spanning tree of a graph (MST)? While checking that the result tree is a spanning tree is easy, checking that it is minimum is not obvious: nobody has written Assert.IsMinimum yet :). Here are a couple techniques that I found useful along the way:

Input-Output table

The most obvious approach is to pre-compute the result for a number of problems and assert the solution of the algorithm matches. In this MST case, use a small set of graphs for which the MST is well known, and check that the computed MST has the correct weight. This approach will take you only so far and requires a lot of manual work since you need to solve the problem (or find a known solution) for a number of representative cases.

Solution Perturbation

If the algorithm computes a solution that is minimal with respect to a cost function, one can try to perturbate the solution to see if there’s a smaller one. If so, it clearly violates the fact that the solution should be minimal, thus you just found a bug. In the case of MST, this would mean randomly picking edges that are not in the minimum spanning tree, swapping them and evaluate if the result tree is smaller.

This is kind of approach is actually used in optimization where the search space might have local minima (see simulated annealing).

Multiple Implementations

A nice thing about graph problems is that there are usually many (vastly) different ways to solve them. If multiple implementations are available, we can simply compare the result of each algorithm against each other and make sure that they match each other. Each algorithm might have bugs but it is unlikely that they share common ones. Since we have now a good oracle, we can apply this approach that a large number of inputs to increase our coverage.

In the MST case, two popular algorithms are Prim’s and Kruskal’s. Those are 2 different approaches: Prim is built on top of Dijkstra single source shortest path, while Kruskal is built on top of the disjoint-set data structure. By carefully picking the weights of the edges, we can assert different things:

  • if the edge weights are all equals, any spanning tree is minimal. So we can compare the result to a depth-first-search algorithm (which can easily compute a spanning tree).
  • if some edge weights are different, there may be many minimum spanning tree. In this case, we can still assert that weight of the tree is minimum.
  • if all the edge weights are different, then the MST is unique. This fact can be used to precisely pinpoint the differences in solution during testing.

There is a corner case that needs to be checked: if all algorithm are no-op, i.e. they don’t do anything. There solution will always match!

Happy New Year!

posted on Thursday, January 01, 2009 9:03:14 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Tuesday, December 30, 2008

I recently implemented a Disjoint-Set data structure in QuickGraph, which is the main building block of the Kruskal’s minimum spanning tree algorithm. This is a fun data-structure to look at, as it purpose is quite different from the ‘main’ BCL collections. The disjoint-set is useful to partition elements into sets, and defines 2 main operation to that purpose: Find, finds the set an element belongs to (can be used to check if 2 elements are in the same set), Union merges two sets.

The source of the disjoint-set is in the QuickGraph source, if you are curious about the details.

Testing the disjoint-set

There is not much left to TDD when it comes to write data structure. Such data structure are usually described in details in an article, and you ‘just’ have to follow the authors instruction to implement. Nevertheless, unit testing is really critical to ensure that the implementation is correct – but there is a risk of having to re-implement the algorithm to test it.

For the disjoint-set, I used 2 tools developed at RiSE: Code Contracts and… Pex. When implementing data structure from the literature, I usually start ‘dumping’ the code and the contracts. As much as possible, any invariant or property that the author describes should be translated to contracts. It will give more opportunities for Pex to find bugs in my code.

Contracts First

For example, the contracts for the Union method look as follows:

private bool Union(Element left, Element right)
{
    Contract.Requires(left != null);
    Contract.Requires(right != null);
    Contract.Ensures(Contract.Result<bool>() 
        ? Contract.OldValue(this.SetCount) - 1 == this.SetCount
        : Contract.OldValue(this.SetCount) == this.SetCount
        );
    Contract.Ensures(this.FindNoCompression(left) == this.FindNoCompression(right));

The two first requires clauses do the usual null checks. The first ensures clause checks that if Union returns true, a merge has been done and the number of sets has decreased by 1. The last ensures checks that left and right belong to the same set at the end of the union.

Note that so far, I did not have to provide any kind of implementation details. The other methods in the implementation receive the same treatment.

A Parameterized Unit Test

I wrote a single parameterized unit test while writing/debugging the implementation of the forest. It could probably have been re-factored into many smaller tests, but for the sake of laziness, I used a single one.

The parameterized unit test implement a common scenario: add elements to the disjoint-set, then apply a bunch of union operations. Along the way, we can rely on test assertion and code contracts to check the correctness of our implementation.

Let’s start with the test method signature:

[PexMethod]
public void Unions(int elementCount, [PexAssumeNotNull]KeyValuePair<int,int>[] unions) {

The test takes a number of element to add and a sequence of unions to apply. The input data as is needs to be refined to be useful. In that sense, we add assumptions, under the form of calls to PexAssume, to tell Pex ‘how it should shape’ the input data. In this case, we want to ensure that elementCount is positive and relatively small; and that the values in unions are within the [0…elementCount) range.

    PexAssume.IsTrue(0 < elementCount);
    PexSymbolicValue.Minimize(elementCount);
    PexAssume.TrueForAll(
        unions, 
        u => 0 <= u.Key && u.Key < elementCount && 
             0 <= u.Value && u.Value < elementCount
             );

Now that we have some data, we can start writing the first part of the scenario: filling the disjoint-set. To do so, we simply add the integers from [0..elementCount). Along the way, we check that the Contains, ElementCount, SetCount all behave as expected:

    var target = new ForestDisjointSet<int>();
    // fill up with 0..elementCount - 1
    for (int i = 0; i < elementCount; i++)
    {
        target.MakeSet(i);
        Assert.IsTrue(target.Contains(i));
        Assert.AreEqual(i + 1, target.ElementCount);
        Assert.AreEqual(i + 1, target.SetCount);
    }

The second part gets more interesting. Each element of the unions array is a ‘union’ action between 2 elements:

    // apply Union for pairs unions[i], unions[i+1]
    for (int i = 0; i < unions.Length; i++)
    {
        var left = unions[i].Key;
        var right= unions[i].Value;

        var setCount = target.SetCount;
        bool unioned = target.Union(left, right);

There is 2 things we want to assert here: first, then left and right now belong to the same set. Secondly, that the SetCount has been updated correctly: if unioned is true, it should have decreased by one.

        // should be in the same set now
      
Assert.IsTrue(target.AreInSameSet(left, right));
        // if unioned, the count decreased by 1
      
PexAssert.ImpliesIsTrue(unioned, () => setCount - 1 == target.SetCount);
    }
}

From this parameterized unit test, I could work on the implementation, and refine again and again until it had all passing tests (I did not write any other test code). 

Happy Ending

The resulting test suite generated by Pex is summarized in the screenshot below: the number of elements does not really matter, what is interesting is the sequence of unions performed. This test suite achieves 100% coverage of the methods under test :).

image

In fact, the Union method involved some tricky branches to cover, due to some optimization occurring in the disjoint-set. Pex managed to generate unit tests for each of the branches.

image

Thanks to the contracts, the test assertions, and the high code coverage of the generated test suite, I have now a good confidence that my code is properly tested. The End!

(Next time, we will talk about testing minimum spanning tree implementations…)

posted on Tuesday, December 30, 2008 11:41:45 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Thursday, December 18, 2008

My second son Emile de Halleux was born on Monday with 4.204g and lots of hair. He and his mother are enjoying a nice hot fire while the winter storm is slamming Seattle (glad he decided to come out before the icy mess). Happy Holidays!

posted on Thursday, December 18, 2008 4:19:09 PM (Pacific Standard Time, UTC-08:00)  #    Comments [7]
# Thursday, December 04, 2008

I just released a new version of QuickGraph. Among many small improvements here and there, the big news are:

  • .net 2.0 support is back! QuickGraph now builds both for .net 2.0 and .net 3.5 (with extension methods).
  • The Dijkstra algorithm use a Fibonacci Heap under the hood, which is way more efficient. Courtesy of Roy Williams!
  • better serialization support and other bug fixes.

Enjoy, Peli.

posted on Thursday, December 04, 2008 12:28:30 AM (Pacific Standard Time, UTC-08:00)  #    Comments [2]
# Tuesday, December 02, 2008

Very exciting, I just published my first video on channel9. In the future, I’ll posting more interviews about researchers and projects that happens in our group in Redmond. Expect to get more videos on Pex, Code Contracts, CHESS, etc… over there.

http://channel9.msdn.com/posts/Peli/The-RiSE-of-Research-in-Software-Engineering/

posted on Tuesday, December 02, 2008 12:53:26 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Tuesday, November 25, 2008

We just published a tutorial on the code digger on CodeProject, enjoy…

http://www.codeproject.com/KB/testing/pex.aspx

posted on Tuesday, November 25, 2008 8:13:29 AM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Friday, November 14, 2008

We will give a session on Pex at the Seattle Code Camp – Saturday 11/15 at 2.45pm. We will show the new Code Digger and how Pex can help you…

posted on Friday, November 14, 2008 1:35:49 PM (Pacific Standard Time, UTC-08:00)  #    Comments [0]
# Wednesday, November 05, 2008

Check out the session on Code Contracts and Pex on Channel 9. You will learn about the new cool API to express pre-conditions, post-conditions and invariants in your favorite language – i.e. design by contracts (DbC) for .NET and the new Code Digger experience in Pex, and most importantly how DbC and Pex play well together.

http://channel9.msdn.com/pdc2008/tl51/

image

posted on Wednesday, November 05, 2008 11:10:22 PM (Pacific Standard Time, UTC-08:00)  #    Comments [4]