# Wednesday, May 19, 2004

As most of you should know by now, PageRank is the ranking system used by Google to estimate the importance of a page (you can see it in the Google toolbar). Of course, since the basic idea of the algorithm was published they may have been some significant modifications. In this blog, I'll how we can use QuickGraph to compute the PageRank of a graph...

PageRank

The idea behind PageRank is simple and intuitive: pages that are important are referenced by other important pages, page importance is distributed to out-edges. There is an important literature on the web that explains PageRank:

The PageRank is computed by using the following iterative formula:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) 

where PR is the PageRank, d is a damping factor usually set to 0.85, C(v) is the number of out edges of v.

PageRank can also be expressed in terms of matrix algebra where it is shown that it is equivalent to finding the eigen values of a sparse matrix (see http://citeseer.ist.psu.edu/kamvar03extrapolation.html). And in fact, the above formula is equivalent to the Power Method (a slow method for finding eigen values).

Where's my LAPACK ?

Until C# (and .Net) has a real and free wrapper around LAPACK, we cannot attack PageRank using matrix algebra. As mentionned above, the formula is equivalent to the Power Method, which is a slow (potentially very slow) method for computing eigen values (convergence rate is |la_1 / la_2| where la_1 is the largest eigen value, la_2 is the second largest eigen value). There are other faster methods (like model reduction) that we could use to speed up things if we had LAPACK (I'm throwing a bottle in the .Net sea here).

Implementation in QuickGraph

Since we have no matrix algebra, the implementation is very basic and unefficient (I'm almost ashamed). This is almost a disclaimer: do not use this algorithm for big graphs, it is potentially slow. The main loop looks like this:

// temporay rank dictionary
VertexDoubleDictionary tempRanks = new VertexDoubleDictionary();
// create filtered graph that removes dangling links
FilteredBidirectionalGraph fg = new FilteredBidirectionalGraph(
    this.VisitedGraph,
    Preds.KeepAllEdges(),
    new InDictionaryVertexPredicate(this.ranks)
    ); 

int iter = 0;
double error = 0;
do
{
    // compute page ranks
    error = 0;
    foreach(DictionaryEntry de in this.Ranks) 
    {
        IVertex v = (IVertex)de.Key;
        double rank = (double)de.Value;

        double r = 0;
        foreach(IEdge e in fg.InEdges(v))
        {
            r += this.ranks[e.Source] / fg.OutDegree(e.Source);
        }
        // add sourceRank and store
        double newRank = (1-this.damping) + this.damping * r;
        tempRanks[v] = newRank;
        // compute deviation
        error += Math.Abs(rank - newRank);
    } 
    // swap ranks
    VertexDoubleDictionary temp = ranks;
    ranks = tempRanks;
    tempRanks = temp; 
    iter++;
// iterate until convergence, or max iteration reached
}while( error > this.tolerance && iter < this.maxIterations);

where ranks is the PR method, damping is the d factor. Note that because we use enumerators, we cannot modify the ranks as we iterate the dictionary, otherwize the enumerator would be invalidated, therefore we use 2 dictionaries and swap between them as we go along.

The results

As usual, we use GraphvizAlgorithm to output the results. Here are some graph sample with each corresponding page rank:

posted on Wednesday, May 19, 2004 9:22:00 AM (Pacific Daylight Time, UTC-07:00)  #    Comments [5]
Tracked by:
"benefits goat milk" (online) [Trackback]
Monday, June 06, 2005 4:03:35 AM (Pacific Daylight Time, UTC-07:00)
Does Lutz Roeder's Mapack for .NET (<a target="_new" href="http://www.aisto.com/roeder/dotnet/">http://www.aisto.com/roeder/dotnet/</a>) have the linear algebra functions you need?
John Lam
Monday, June 06, 2005 4:03:36 AM (Pacific Daylight Time, UTC-07:00)
Thanks for the link to Mapack. This should be good for a start.
<br>
<br>There are however 1 gotcha to Mapack, it does not rely on a &quot;fortran&quot; compatible structure. This means that I cannot use &quot;advanced&quot; algorithms as the ones that you can find in ARPACK (<a target="_new" href="http://www.caam.rice.edu/software/ARPACK/">http://www.caam.rice.edu/software/ARPACK/</a>) or SLICOT (<a target="_new" href="http://www.win.tue.nl/niconet/NIC2/slicot.html">http://www.win.tue.nl/niconet/NIC2/slicot.html</a>). Both libraries have algorithm to compute eigen values of large sparse matrix.
Jonathan de Halleux
Monday, June 06, 2005 4:03:36 AM (Pacific Daylight Time, UTC-07:00)
take a look here
<br>
<br><a target="_new" href="http://cpplapack.sourceforge.net/">http://cpplapack.sourceforge.net/</a>
<br>
<br>for a C++ wrapper to the lapack and blas routines. They all use the standard fortran structures.
<br>
<br>I am using it (minimally I admit - my stats etc skills are weak, but getting better!) I am routinely converting __gc[] matrices and vectors back and forth to use these wrappers.
Eric Meyer
Monday, June 06, 2005 4:03:37 AM (Pacific Daylight Time, UTC-07:00)
Thanks for the tip. Actually, you can check out dnanalytics for a nice wrapper around LAPACK.
Jonathan de Halleux
Thursday, July 21, 2005 7:08:45 AM (Pacific Daylight Time, UTC-07:00)
nice site

-tadalafil
Comments are closed.