Saturday, December 24, 2011

Running YouNeedTests on windows xp with visual studio express 2010

Tutorial: running YouNeedTests on windows xp

The problem

I've updated YouNeedTests, my cross-platform tool for extracting and building C++ unit tests embedded in source code comments, to make it usable on Microsoft windows systems too. Here's a basic tutorial on how to set it up and get started on windows.

The tutorial

Installation

YouNeedTests depends on the following tools:
  • Python 3.2 (or newer): get it from the python website.
  • PyYaml: get it from the PyYaml website. I've used version 3.10
  • Mako: get it from the mako templates website. I've installed the latest available version 0.5.0 using Distribute.
  • CMake: get it from the cmake website.
  • YouNeedTests: you can either install git and clone the gitorious repository (this is easy via the installshield of git extensions) or you can head over to the gitorious repository and download the master branch as a tar.gz package (then you will also need a tool like 7-zip to unpack the package).
  • A C++ compiler. Strictly speaking, this is optional, but obviously YouNeedTests won't be very useful without one :) I've installed visual studio 2010 Express edition (free download) from the microsoft website.

Note:the google test framework is included in the YouNeedTests tool.

Building and running some tests

YouNeedTests comes with a run.bat script that runs YouNeedTests on some sample testinputs folder.

  • In a first step, run.bat will delete the testoutput folder if it already exists.
  • In a second step, "The "run.bat" will create a testoutput folder with a file called CMakeLists.txt and a series of folders containing automatically generated code (one folder per testsuite). Run.bat is a very short file, and I encourage you to take a look (and correct the paths to your python installation if needed).
  • In a third step, run.bat will change the working directory to the testoutput folder and try to run CMake on it, in order to generate the visual studio solution. CMake also supports many other compilers - run
    cmake --help
    to get an overview of supported environments.

When run.bat has finished, you should find a lot of files in your testoutput folder (imagine having to create all those by yourself to get a feeling for what YouNeedTests could do for you). You want to open the ALL_TESTS.sln file in Visual Studio 2010 Express. Inside the solution you will find different projects.

  • The ALL_BUILD project: build this to build the googletest framework and the tests extracted from the c++ comments.
  • The ALL_TEST project: build this to run all the tests
  • The t1, t2, t3, ... projects: setting one of those as startup project will run only that testsuite. Running a testsuite separately like this allows you to get much more detailed test output. Unfortunately, the console window containing the test results quickly disappears when the test finishes. You may want to run the test executable from a dos window instead so you can inspect the results.

Troubleshooting

If you run the sample tests by building the ALL_TEST project, but you get errors like

Could not find executable C:/development/youneedtests/youneedtests/testoutput/t4Avg/t4Avg
, it means you forgot to build the ALL_BUILD project first. Right-click ALL_BUILD in the solution explorer in visual studio, and click Build.

If you successfully built the ALL_BUILD project, but while running the sample tests included with YouNeedTests all tests fail with a reason like "OTHER FAILURE", it means that the gtest.dll was not found. In that case you should add the

youneedtests\testoutput\googletest\Debug
folder (use the full path as applicable on your sytem) to your PATH (right click
My Computer
, then click
Properties
, click
Advanced
, click
Environment Variables
, and in the
System Variables
group box, add the folder in the
Path
variable.) An alternative would be for YouNeedTests to copy the .dll files into each of the testsuite folders, but I'd rather not do that (the more tests you have, the more copies of gtest.dll are needed).

Monday, December 19, 2011

YouNeedTests: a python3 based C++ unit testing tool

(Image retrieved from http://blog.hinshelwood.com/. According to google image search this image is labeled for reuse. Please let me know if you disagree.)

Unit testing in C++

Perhaps you recognize the following story. You're finished writing C++ code, and now you want to add some unit tests, because it makes the metrics look good. You're using a well-established C++ unit testing framework, like CppUnit. Adding a new test requires manually adding files to the solution, adding seemingly duplicated information in .cpp and .h files (i.e. implementations and declarations) and invoking obscure macros. Or perhaps you copy an existing test file and start editing it to test new stuff. Or worse: you extend an existing test with some extra lines to avoid creating all the boilerplate code over and over again.

However, being a programmer and not a file copier, I would really appreciate if we could use our computer a bit more efficiently to specify tests.

YouNeedTests: Python3's doctest meets fitnesse for C++ code

doctest? fitnesse?

The brilliant thing about python's doctest module is how it extracts tests from comments embedded in the source code. This leads to executable documentation which by its very nature never needs to become outdated. Doctest is typically used to write unit tests: tests written by programmers for programmers, trying to test little bits of code in isolation.

The brilliant thing about fitnesse is how it specifies different test scenarios in table form. Fitnesse is typically used to define acceptance tests: tests written by business and QA people and intended for business and QA people. Acceptance tests usually involve more objects than unit tests.

requirements

I wanted a tool that could be used with C++, and that transforms comments embedded in the C++ code into unit tests. I also wanted a way to specify different test scenarios in table form. Because I prefer python over C++ in matters of text processing, I decided this was an ideal opportunity to wet my feet in several "hot" and "established" technologies:

  • git as source code management system. The brilliant thing about git is its distributed model, its insane flexibility, speed and compactness, and its superb support for branching and merging.
  • python3, the somewhat controversial, non-backward compatible successor to the very popular python2 language. I still have to find out what's so brilliant about python3 - but so far python3 hasn't been a negative experience.
  • yaml to format the comments from which the code will be generated. The brilliant thing about yaml is how it combines expressiveness of XML with a much more human friendly syntax. Human friendly enough that I didn't create a domain specific language to specify tests in.
  • googletest as backend unit testing framework. The brilliant thing about the googletest framework is how it requires fairly little boilerplate code to write simple tests. The second brilliant thing about googletest is that it supports death-tests, i.e. you can check if a piece of code crashes as expected.
  • CMake for the build system. The brilliant thing about CMake is how cross-platform, easy to use and versatile it is as a build system. It also comes with built-in provision for testing (and loads of stuff I haven't discovered yet).
  • mako to generate boiler plate code and build scripts. The brilliant thing about mako templates is their ease of use and the fast speeds with which they are rendered.

Example

Here's a simple comment that will 6 independent tests from a table of scenarios. Not all tests need to have a table, one can also embed raw code if desired. A comment like the above consists of several parts:
  • a testsuite name
  • a testsuite type: for now 3 types are supported:
    • COLUMNBASED: CODE section contains a list of tables. Each line in each table is an independent test.
    • ROWBASED: CODE section contains a list of tables. Each table becomes a test. Each line in a table is one step in the test.
    • RAW: CODE section contains normal C++ code.
  • a LINK section: this specifies which .cpp files should be linked in with the test
  • an INCLUDE section: this specifies which .h files to include in with the test
  • a STUBS section: this can contain arbitrary C++ code to resolve linker errors. Only stub those parts of the code which are not testing. There's no point in testing stubs :)
  • a PRE section: code in the PRE section is executed before anything from the table is execute
  • a POST section: code in the POST section is executed after executing statements from the table, but before executing assertions from the table
  • a CODE section. In case of a COLUMNBASED or ROWBASED test suite, the CODE section contains a list of tables. Each table has a table header that specifies code templates with placeholders $1, $2, .... The placeholders will be filled in with values from the table cells in the same column. Each table has a NULL column (the ~ symbol) which separates the statements on a line from the assertions.
    • The line containing the string 'twovalues' in table 'TABLE1', e.g. will result in the following lines of code being generated: where the double comma (",,") causes the code template to be unrolled. (Of course the system will also generate all the rest of the required boilerplate code and buildscripts, finally leading to this .cpp file (buildscript not shown here): Despite being used with a toy example, the table format already results in quite some space and boiler-plate reduction :)
All the generated tests can be run at once by issuing "make test" in the output folder:
More detailed test results are available by individually running tests:

Proof of concept code!

Got curious? Hop over to Gitorious! LGPL'd proof of concept code - for now I only tried to use it on debian sid amd64.

Friday, December 16, 2011

Application specific software metrics

Image available under creative commons license from http://www.flickr.com/photos/enrevanche/2991963069/

How to make developers like software metrics

The problem

Managers like metrics. Almost every software project of a given size is characterized by metrics. According to wikipedia,

A software metric is a measure of some property of a piece of software or its specifications. <...> The goal is obtaining objective, reproducible and quantifiable measurements, which may have numerous valuable applications in schedule and budget planning, cost estimation, quality assurance testing, software debugging, software performance optimization, and optimal personnel task assignments.

Typical OO software metrics include things like: "lines of code", "number of modules per 1000 lines of code", "number of comments", "McCabe complexity", "Coupling", "Fan-in" or "Fan-out" of classes. The problem many such general metrics is that they describe (aspects of) the state of your software, but they don't tell you how to go about improving something. At most they give vague hints if any at all: I would hope it's clear to experienced developers that adding lines of code is not something to actively strive for, unless you get a bonus for each line of code you write. Does a high fan-in mean high coupling or rather good code reuse? Does a high number of comments automatically imply they are relevant and up-to-date? It could also be a sign that your codebase is so hard to understand that one needs to include half a manual with each line of code?

An alternative

What can be done to make sure that developers like metrics too? In my opinion, we have to carefully craft our metrics so that they fulfill three basic properties:
  • Single number: it should be possible to summarize each metric in a single, meaningful number. For one specific metric, a higher number must always mean a better (worse) result.
  • Concrete action: for each deterioration it must be unambiguously clear how to go about improving it.
  • Automatable: the metrics must be easy (and preferably very cheap) to calculate automatically. Each developer can be warned automatically if he made a metric significantly worse (or rewarded with a positive report if she improved it) after committing changes to the version control system.

You think this sounds like Utopia and probably requires extremely expensive tools? Think again! With minimal efforts one can already attain some very useful application specific metrics.

Examples

Here I list some possible metrics (actually I have implemented all of these and more as a "hobby" project at my day job):

Counting regular expressions

A lot of useful metrics can be built by counting occurrences of regular expressions. Examples:
  • Counting deprecated constructs: While introducing a new framework in a significant piece of software, there will always be a period where your software features a mixture of both old code and new framework code. Count the API calls of the old code. Anyone adding new calls to the old API is warned to use the new API instead.
  • Counting conditions: if you are removing "if (predicate) doSomething;" statements and replacing them with polymorphism, count how often the old predicates are called. Anyone who adds new calls to the predicate can be warned automatically about using the new framework instead.

Monitoring dependencies between projects and coupling between classes

If your software has a layered structure, your will typically have constraints about which projects are allowed to include from which other projects. Count and list all violations by analyzing your include dependencies. I also use include dependencies to get a rough estimation of coupling between classes and impact of adding/removing #includes (by calculating how many extra statements will have to be compiled as a result of the new #include). (Shameless self-plug: you can use my FOSS pycdep tool for this).

Monitoring McCabe complexity

If you add new "if" statements, you can be warned automatically about increased complexity. This can be automated using a tool like sourcemonitor which has command line options that allow you to bypass its GUI and to integrate it in your own flow.

Unit tests

Check the results of your unit tests after each commit. Anyone breaking one or more tests is warned automatically about fixing them.

Using it in real life

Of course, no one is stopping you to add to these basic tools some machinery to run the metrics automatically on every commit, preferably generating incremental results (i.e. the metrics should measure what changed compared to the previous commit, so you get a clear idea of the impact your changes had) generate diffs and distribute over different computers and collect the results using some suitable framework, make the reports available using a web application created in an easy-to-use web application framework.

In my day job I have set up two such systems running in parallel: the first system will send one email a day, summarizing all changes in all metrics compared yesterday's version of the software (or for some metrics also the changes that took place since the start of the new sprint). The comparisons happen by comparing the metrics tool test reports with reference reports. Reference reports have to be updated explicitly (via the web front-end) annotated with a reason and a rating (improvement/deterioration/status quo). All team members get this report so if one did a really good job of cleaning up code, everyone in the team becomes aware of it (and if he did a really lousy job, there might be some social pressure to get it right ;) ). The second system calculates incremental metrics per commit, sends email reports to the committer only, but makes the reports available for interested viewers on the intranet (together with author, commit message and revision number).

Although such system can sound scary (I named it "Big Brother"), in practice we only use it to improve both code and team quality and an anonymous poll showed that without exception, every developer liked it (no one wants to dive into someone else's lousy code :) ) Reports with significant changes (good or bad) are discussed in the team on a daily standup meeting, and can identify misunderstandings about the architecture of the code, or result in ideas for organizing training sessions.

Caution

Relying solely on such metrics can give a false fuzzy feeling of software quality. One only improves what is measured. Code review by experienced team members is a good addition to all of the above.

If you, dear reader, have ideas for other metrics, or remarks about the contents of this or other blog entries, feel free to comment.