Sunday, May 22, 2011

Circular dependency detection revisited

In last blog post I mentioned the much more efficient SCC detection algorithm by Tarjan. As it turns out, SWI-PROLOG's clpfd library contains an implementation of the SCC algorithm.
The predicates that implement it are not exposed by the clpfd library though. I got help from the author of the library, Markus Triska, to use the scc algorithm outside the library, and the results are incorporated in pycdep's latest source code.

Cycle detection now is as simple as typing

With this better algorithm, the cycle detection on the 2800+ files and 19000 include dependencies program I mentioned in the previous blog post now only takes 6 seconds.

Thanks again, Markus Triska.

Thursday, May 19, 2011

Double inclusion and circular dependencies

If you thought we were out of neat tricks that can be performed with pycdep, here's some more :)

Detecting header files that are included twice (or more).

Here's a simple prolog program to keep only those elements in a list that occur more than once:

You could use it in a query as follows:

Detecting circular dependencies

Circular dependencies are a code maintainer's nightmare. They can cause all kinds of subtle and less subtle problems (if you get it to compile at least).

To get rid of circular dependencies one can do several things (depending on the situation at hand). Sometimes it suffices to move data structures into a file of their own, where the circularly dependent files can now both include it. Another technique is to avoid #includes and use as much forward declarations as possible.

Before you can solve circular dependencies, you have to be aware of them. Especially if the cycles are big (a includes b which includes c which includes d which includes a), it may not be obvious at first sight that there is a circular dependency. Luckily pycdep is able to tell us about circular dependencies. More precisely, we will calculate the dependency graph's strongly connected components (SCC). A strongly connected component is a subgraph of which each node is reachable from each other node. This is only possible if it contains cycles. Note that an SCC can encompass multiple elementary cycles. In the code shown below, no attempt is made to further decompose the SCC into elementary cycles.

The algorithm shown here is not very efficient (for reasons of efficiency Tarjan's SCC detection algorithm would be more suitable, but it's not as easy to write and understand as the one shown here. Maybe one day... :-) ). I have used it successfully on a 2800+ file code base with around 19000 include dependencies, to find all cycles in a few minutes on a reasonably modern pc.

You can use it as follows:

The program works by noting that two nodes X and Y belong to the same SCC if there's a path from X to Y, and vice versa. The scc(X, Result) predicate calculates the list Result, which contains all nodes that belong to the same SCC as X. The all_scc(R) returns a list of all the SCC found in the include file graph. To perform its magic, it uses a predicate all_scc_helper that starts from a node taken from the list of all eligible nodes, then finds all the nodes belonging to the same SCC. All the nodes that belong to the SCC, are then removed from the list of eligible nodes, and the same steps are done on the remaining nodes.

Note that the program can be sped up by memoizing the path calculation in the implementation of "X depends on Y". Memoization is a technique in which results are calculated once, and then the outcome of the calculation is stored as a new fact in the prolog database. This avoids that the same calculation has to be redone multiple times. Using a better algorithm (like the ones by Tarjan, Kosoraju or Gabow) of course would be a better approach.

Memoization of the path calculation can be achieved by changing the definition of X depends on Y as follows: