What infrastructure do you need to start developing research software? - our top tips

Posted by m.jackson on 6 June 2012 - 12:36pm

Substation.jpgBy Mike Jackson.

It doesn't matter whether you've just started to develop research software, you're close to a first proof of concept, or you're about to release a prototype, at some stage you will ask yourself what infrastructure do I need?. In this post, I'll present our top tips on the infrastructure to use when starting to develop research software.

Whether you're part of an international collaboration or just a solo researcher, infrastructure will make a valuable contribution to your development. Our tips will help you be unselfish: open, responsive, communicative and considerate to your fellow researchers to encourage them to engage with you and contribute to the onward development of your software and your research.

1. Revision control is the single most important tool you'll use!

Also known as version control or a source-code repository, revision control is the most important tool you'll ever use when developing research software. Revision control is a way of storing software that allows you to retrieve any version of your code from any point in time (thus avoiding nightmares like the complete loss of code that results from a stolen laptop). It allows you to record the provenance of your software - what you wrote, the ideas you explored, the changes you made and the reasons you made them. If you're part of a development team, then revision control allows you to work together without accidently deleting each others' code.

How important is revision control? Greg Wilson of Software Carpentry said "If you are not using revision control then, whatever else you may be doing with a computer, you are not doing science".

2. Use your website to tell the world about you

Your website is your public face, and it is the first port of call for anyone seeking information about your project. When starting out, you don't need frames, Flash or an under construction page (they are like, so 1990s), you just need some simple, cleanly presented and informative HTML pages.

You should provide an overview of your research and links to papers, presentations, your licence and software releases (and maybe a recommended citation, to encourage researchers who use your software to acknowledge your contribution to their research). Provide a prominent link to your contact details, like your email address, to encourage others to get in touch (see our tip below).

It is very important that your website is regularly updated. If a website looks old, people will assume that your project is not active. Keeping your website up to date also helps to show progress to important collaborators (like funders).

If you don't have web expertise in your team, you could consider basing your website on a wiki or a blog, both of which make it relatively simple to set up a website and add content. This simplicity comes at the cost of reduced flexibility in functionality and design, but that's generally only a problem for big projects. We've recently been working with the MICE particle physics project who use a wiki for their MAUS software, and PELAGIOS use a blog. In fact, JISC now frequently require that the projects they fund have, at the very least, a blog to describe what's happening with the project. For more information, take a look at our five top tips for promoting your software. On which note...

3. Register a domain name

Owning a domain name is vital for your project's identity. It provides a home for your project and allows you to build the resources needed to create a community around your software. A domain name can be registered very cheaply (for example, see 123-reg or LCN).

Trade-marking the name of your software can help you to protect and reinforce your unique identity (for example, OGSA-DAI is trade-marked in the UK). For help with trademarking, see the UK Intellectual Property Office.

4. Encourage people to get in touch by providing an email address

Researchers will want to get in touch to ask for more information about your project, so you should provide an email address.

Your address should be based on your project or software name, rather than your personal name (e.g. dna-splicer@university.ac.uk instead of j.bloggs@university.ac.uk), because this gives people the sense that your project is a professional entity, rather than a one-person affair run from your bedroom. It is very important that your emails are received by more than one person. This ensures that emails are read and replied to promptly, even if one or more of your team are unavailable. A researcher will judge your project solely on the basis of an unacknowledged email.

You should preserve your emails and replies, because they can e used as a history of your project's communications with the outside world, and can be used to create metrics ("we had 32 support requests and successfully resolved all of them") which will be of interest to funders.

5. Use an email list to communicate

If you're working with other people, whether they be at your home institution or elsewhere, then an email list is a must-have. It will help you manage the development of your software and the running of your project. With your trusty email list, you can discuss any aspect of your project - whether it be what new feature to implement, when to release the next version, what conference to go to, when and where to have a meeting, or the minutes from the previous meeting.

As with your contact email address, you should archive your email list so that it can act as a history of your project.

6. Manage to-dos with an issue tracker

An issue tracker (also known as a ticketing system) may be intimidating at first, and may appear a little heavyweight if you're a one-person project. But they're not, really!

Think of an issue tracker as a to-do list - a place where you can record all the features you're yet to implement and the bugs you're yet to fix. In addition, you can use it to record any other project-related tasks, for example "rerun analysis on the Central London Data Set", "submit an abstract to Molecular Dynamics Conference 2012", or "write a presentation for Professor John Smith's research group". Check out the MICE particle physics project's issue tracker for their MAUS software or Tracking community intelligence with Trac within the Institute - by our very own Rob Baxter and Neil Chue Hong - on how we use the Trac ticketing system to manage our work. (And yes, my task to write this article had an associated ticket on our Trac system.)

In the early days of your project, only your website and email address need to be public. As your project grows, and gathers users, you may want to consider opening up some other resources. Providing public, read-only access to your source-code repository might engage the interest of researchers who wish to contribute to its development. A public issue tracker will allow your users to see how you are dealing with your software's bugs and intended features. Evolving your infrastructure as users and developers gather around your research and software will be the subject of future top tips. As for these top tips, are we completely off target, spot on, or somewhere in between? Why don't you tell us what you think.