What makes good code good at EMCSR 2014

Posted by s.crouch on 15 September 2014 - 2:00pm

By Steve Crouch.

On August 8th 2014, I attended the first Summer School in Experimental Methodology in Computational Research at the University of St. Andrews in Scotland. Run as a pilot for primarily computer scientists, it explored the latest methods and tools for enabling reproducible and recomputable research, and the aim is to build on this successful event and hold a bigger one next year.

The Institute already works with the Summer School organisers in a related project, recomputation.org. Led by Ian Gent, this project aims to allow the reproduction of scientific results generated using software by other researchers, by packaging up software and its dependencies into a virtual machine that others can easily download and run to reproduce those results.

My 5 hour trip from Manchester meant I had ample time to enjoy some stunning views of the Scottish countryside (and the multitude of golf courses as you approach St Andrews!). If the camera on my phone had been working, you would have been able to see what I mean. But doubtless it wouldn't have done the scenery justice anyway.

Upon my arrival, I was greeted by a welcome British sight - a barbeque for the participants and speakers, which thankfully wasn't accompanied by another British tradition, rain. Running from Monday, the Summer School had already covered a diverse set of topics, from using Microsoft's Azure Cloud to enhance reproducible research, to legal issues in Computer Science and collaborating on code development. Our Institute Director, Neil Chue Hong, had also presented a well-received session at the School, covering the reasons why reproducibility is important in research.

Following my talk about developing sustainable software, we had time for me to ask our famous question: 'What makes good code good?'. After splitting into groups to discuss the issue, the participants came back with the following

Firstly, code should be reusable. Which is to say, in a form that is readily usable by others, both internally within the team that developed it, and externally by other groups. It's often hard to know when others will want to see and use your code - so assume they will from the outset!

Good supporting user documentation is a must too, as it explains how the science relates to (and is accomplished by) aspects the code, and the point was raised that inline comments are also a source of documentation about how it works, and importantly, why coding approaches were taken.

The documentation should be understandable and readable. It should be well commented, clear, with sensible naming of variables and functions, and the coding style should be consistent. It should use the same programming language used by your research community, and it should be under version control and also released under a clear versioning policy.

The development of code should not be judged in isolation and instead take into account other software that others use in the field - when and how should it work with other code? The code should be tested and testable. Code should have unit tests with good code coverage, and the ability for others to test it. Naturally, it also be correct, which is to say, the code does what it's supposed to do.

You should avoid obsolete approaches. Code should make use of third party libraries and standards that are accepted, well supported and have a future, to avoid issues further down the road. It should be continuously integrated - if you have the spare effort, having code changes validated by running tests on it automatically can greatly benefit development teams (especially large ones).

Your code should be easily built: if others can't build it, they can't develop it, and if binaries aren't supplied, they won't be able to run it either. It needs to be portable, so that others can run it on their system of choice. It should allow be modular. Having well-structured code makes the software more understandable and easier to maintain and develop, and can also allow easier reuse of code modules in other development projects.

Finally, you need to Keep It Simple, Stupid. Follow the KISS heuristic, and don't over-complicate things! You mustn't forget user support either. If you don't support the wider community in their use and development of the code, they won't support its use!

The most important aspects were reusability, documentation, understandability and readability, use of a community-supported language, and versioning, which were highlighted by multiple groups.

A bit thank you to Ian Gent, Olexandr Konovalov and Tristan Henderson for organising such a great Summer School and inviting the Institute to present, and to the participants for taking part and engaging in some great discussions. I'm looking forward to a bigger event next year!