Tuesday, November 22, 2011

Semantic diff

The problem

I often have the problem of having to compare two ascii reports containing lots of numbers. The two reports contain almost the same values, but here and there small numerical/text differences may occur.

I'm interested in finding out where the differences are, and in the case of numerical differences, where the biggest difference is.

Until recently I just used a normal diff tool, but then it occurred to me that it should be easy to write what I call a "semantic diff" tool, i.e. a specialized diff tool that better highlights the kind of differences I'm interested in.

Here's an example of what I mean (the tables in real life are much biggger, and hopefully make more sense :) )

Reference table

Actual table

Python2.x code

Since both reports have exactly the same layout, it is quite easy to write such a diff tool in python. I want to highlight changes in text that may occur in the report, as well as color code the magnitude of the numerical differences between the two reports (if any). Here's one way to do it (warning: i've reduced indentation because of space constraints): Here's the mako template:


And finally here's the result, when viewed in a browser. Note that the textual differences are marked in yellow. The numerical differences are marked in a color that varies between blue (smallest difference) to red (biggest difference). Not too bad for what is essentially only a few lines of code.