Friday, December 16, 2011

Application specific software metrics

Image available under creative commons license from http://www.flickr.com/photos/enrevanche/2991963069/

How to make developers like software metrics

The problem

Managers like metrics. Almost every software project of a given size is characterized by metrics. According to wikipedia,

A software metric is a measure of some property of a piece of software or its specifications. <...> The goal is obtaining objective, reproducible and quantifiable measurements, which may have numerous valuable applications in schedule and budget planning, cost estimation, quality assurance testing, software debugging, software performance optimization, and optimal personnel task assignments.

Typical OO software metrics include things like: "lines of code", "number of modules per 1000 lines of code", "number of comments", "McCabe complexity", "Coupling", "Fan-in" or "Fan-out" of classes. The problem many such general metrics is that they describe (aspects of) the state of your software, but they don't tell you how to go about improving something. At most they give vague hints if any at all: I would hope it's clear to experienced developers that adding lines of code is not something to actively strive for, unless you get a bonus for each line of code you write. Does a high fan-in mean high coupling or rather good code reuse? Does a high number of comments automatically imply they are relevant and up-to-date? It could also be a sign that your codebase is so hard to understand that one needs to include half a manual with each line of code?

An alternative

What can be done to make sure that developers like metrics too? In my opinion, we have to carefully craft our metrics so that they fulfill three basic properties:
  • Single number: it should be possible to summarize each metric in a single, meaningful number. For one specific metric, a higher number must always mean a better (worse) result.
  • Concrete action: for each deterioration it must be unambiguously clear how to go about improving it.
  • Automatable: the metrics must be easy (and preferably very cheap) to calculate automatically. Each developer can be warned automatically if he made a metric significantly worse (or rewarded with a positive report if she improved it) after committing changes to the version control system.

You think this sounds like Utopia and probably requires extremely expensive tools? Think again! With minimal efforts one can already attain some very useful application specific metrics.

Examples

Here I list some possible metrics (actually I have implemented all of these and more as a "hobby" project at my day job):

Counting regular expressions

A lot of useful metrics can be built by counting occurrences of regular expressions. Examples:
  • Counting deprecated constructs: While introducing a new framework in a significant piece of software, there will always be a period where your software features a mixture of both old code and new framework code. Count the API calls of the old code. Anyone adding new calls to the old API is warned to use the new API instead.
  • Counting conditions: if you are removing "if (predicate) doSomething;" statements and replacing them with polymorphism, count how often the old predicates are called. Anyone who adds new calls to the predicate can be warned automatically about using the new framework instead.

Monitoring dependencies between projects and coupling between classes

If your software has a layered structure, your will typically have constraints about which projects are allowed to include from which other projects. Count and list all violations by analyzing your include dependencies. I also use include dependencies to get a rough estimation of coupling between classes and impact of adding/removing #includes (by calculating how many extra statements will have to be compiled as a result of the new #include). (Shameless self-plug: you can use my FOSS pycdep tool for this).

Monitoring McCabe complexity

If you add new "if" statements, you can be warned automatically about increased complexity. This can be automated using a tool like sourcemonitor which has command line options that allow you to bypass its GUI and to integrate it in your own flow.

Unit tests

Check the results of your unit tests after each commit. Anyone breaking one or more tests is warned automatically about fixing them.

Using it in real life

Of course, no one is stopping you to add to these basic tools some machinery to run the metrics automatically on every commit, preferably generating incremental results (i.e. the metrics should measure what changed compared to the previous commit, so you get a clear idea of the impact your changes had) generate diffs and distribute over different computers and collect the results using some suitable framework, make the reports available using a web application created in an easy-to-use web application framework.

In my day job I have set up two such systems running in parallel: the first system will send one email a day, summarizing all changes in all metrics compared yesterday's version of the software (or for some metrics also the changes that took place since the start of the new sprint). The comparisons happen by comparing the metrics tool test reports with reference reports. Reference reports have to be updated explicitly (via the web front-end) annotated with a reason and a rating (improvement/deterioration/status quo). All team members get this report so if one did a really good job of cleaning up code, everyone in the team becomes aware of it (and if he did a really lousy job, there might be some social pressure to get it right ;) ). The second system calculates incremental metrics per commit, sends email reports to the committer only, but makes the reports available for interested viewers on the intranet (together with author, commit message and revision number).

Although such system can sound scary (I named it "Big Brother"), in practice we only use it to improve both code and team quality and an anonymous poll showed that without exception, every developer liked it (no one wants to dive into someone else's lousy code :) ) Reports with significant changes (good or bad) are discussed in the team on a daily standup meeting, and can identify misunderstandings about the architecture of the code, or result in ideas for organizing training sessions.

Caution

Relying solely on such metrics can give a false fuzzy feeling of software quality. One only improves what is measured. Code review by experienced team members is a good addition to all of the above.

If you, dear reader, have ideas for other metrics, or remarks about the contents of this or other blog entries, feel free to comment.

No comments:

Post a Comment