Wednesday, 19 December 2012

These go up to eleven

You're on ten ... Where can you go? Where? ... Nowhere. Exactly. What we do is, if we need that extra push over the cliff, you know what we do? [pause] Put it up to eleven.- Nigel Tufnel, This is Spinal Tap

Today, I was reminded of this great movie scene when testing a new piece of code. The problem I was testing was to find the shortest paths from one vertex in a graph to all other connected vertices. Listing one path is easy. Listing all alternate shortest paths is a bit more tricky. The current implementation only provided a single shortest path and there was a bug associated with this problem.

I'm going to skim over the details here but will show some of the API. The patch is current on this Sourceforge tracker if you wish to see the complete implementation. To start, some simple cases such as cubane were tested.

The 6 shortest paths between oposite vertices in a cubane molecule (CHEBI:33014)

With the correct result confirmed, it was time to turn up the complexity.  C60 Fullerene seemed a good choice. Unfortunately, after running it, I was disappointed to find there is at maximum only six possible paths to an opposite vertex. This one is a bit tricky to verify but luckily there is a ball and stick model in the office which made it a little easier. I also generated some spiro compounds but I was looking for a more difficult molecule.

C60 fullerene - CHEBI:33128


I was pretty much done with testing until this morning I stumbled across the FULLERENE program and database which provides a download for a C720 fullerene. Now that is a challenge!


C720 fullerene


Time to turn it up to eleven!

Here is the existing code to reconstruct the path from each atom to every other atom. Note we only do the the comparison one way as it is quicker to reverse the path to get both from i to j and from j to i.

Code 1 - All pairs shortest paths
for(int i = 0, n = fullerene.getAtomCount(); i < n; i++){
    for(int j = i + 1; j < n; j++){
        List path = PathTools.getShortestPath(fullerene,
                                                     fullerene.getAtom(i),
                                                     fullerene.getAtom(j));
    }
}

I only managed to run it once and as it took 3481106 ms (~58 min) to complete. Using the new utility we can do the equivalent in well under a second (~200 ms).

Code 2 - All pairs shortest paths new API
AllShortestPaths asp = new AllShortestPaths(fullerene);
 
for(int i = 0, n = fullerene.getAtomCount(); i < n; i++){
    for(int j = i + 1; j < n; j++){  
        IAtom[] path = asp.from(i).atomsTo(j);
    }
}

The current code is actually just a wrapper around several searches from each atom. Including the path reconstruction in the Floyd-Weshall algorithm (implemented as TopologicalMatrix) is also possible but for simplicity I left the AllShortestPaths as it a simple helper. The atomsTo method can be replaced with the pathTo method which will run even quicker. We can also count the number of shortest paths between each vertex.

Code 3 - Counting all pairs paths
sum += asp.from(i).nPathsTo(j);

From this we find a total of 39,086,040 (double that if you include j to i). We can get the actual paths but this takes significantly longer - around 8000 ms. Now thats pretty slow but we can check the theoretical maximum speed by filling an array the size of each shortest paths set.

Code 4 - storing all pairs paths
int[][] paths = new int[asp.from(i).nPathsTo(j)][asp.from(i).distanceTo(j) + 1];
for(int k = 0; k < paths.length; k++){
    for(int l = 0; l < paths[k].length; l++) {
        paths[k][l] = k + l;
    }
}


This will take 3000+ ms on it's own. It may be possible to get the time close to that but it is likely that most of the time difference is spent in array copies and traversing the recursive method call. Currently, this isn't worth optimising further. As the API provides a way to check how many shortest paths there are, it is better to only reconstruct all the paths if there is sufficiently few. After all, if you could reconstruct 39 million paths, you've still got to do something else with them afterwards.