By Aleksandra Pawlik, Training leader, Software Sustainability Institute.
Good and bad experiences, horror stories and uplifting examples of good practice – all of this was heard at Software and Research Town Hall at the American Geophysical Union Fall Meeting 2013. There were about fifty attendees from different institutions and representing different stakeholders: from scientists developing code for advancing their research to software engineers working with researchers on their scientific code.
Thanks to Allen Pope and Melody Sandells (who co-chaired the session) this Town Hall meeting was joined by Marco Tedesco, Polar Cyberinfrastructure Program Director at NSF's Division of Polar Programs. Marco introduced the key ideas of the Program which connected directly with the three main topics covered during the session Collaboration Strategies and Technologies, Software Training for Researchers and Code Reuse, Sharing and Publishing. All attendees were encouraged to engage in discussion, and in this post I will summarise what they were saying.
Collaboration strategies and technologies
There is a wide variety of tools which can be used for collaboration. Google Drive, Dropbox, Skype and the old school email. In general, collaboration on the same resources does not appear to be an issue. But if you scratch the surface, problems emerge. Some institutions don’t allow using certain - very popular - tools among scientists (such as Skype for example). Some tools simply don’t scale up well: like if you have a large number of collaborators trying to join a Google Hangout. Not all researchers want to pay for video group calls on Skype. Others get stuck when they have to transfer terabytes of input data to their collaborators (as someone commented from the audience “Mail the hard drives”).
Software training for researchers
This part of the session turned into quite an intense discussion. Everybody agreed that software training is essential for anyone doing research, and that the training should start as early as possible in a researcher's career. The graduate students need to be taught software skills and understand their value, otherwise they write code which is unstable, difficult to maintain and has many security gaps. There are a lot of training materials available online (such as Coursera). However, if the students do not have basic training in software development, they will not be able to make good use of these resources.
Training should not be limited only to writing code. Students should be taught good practices on commenting code and testing it. Comments should be meaningful and explain why you do something, not just what you do (comments such as: "Now adding x+y" are not helpful). It needs to be emphasised that comments require maintenance just like code. More often than not, when the code is changed, the comments often aren't - and this can lead to major problems.
Code reuse, sharing and publishing
Version control for managing source code turned out to be a popular solution known by the session's attendees. Many used git and GitHub. But as one person in the audience noted, version control can be a bit of a headache. The time you spend figuring it out, is usually precious time taken off research. The attendees also discussed the benefits of code review and different ways it could be done: from pair programming, to group meetings where the code is analysed line by line. Everyone was in favour of source code being published alongside the papers it was used for. An exemplary journal which actually reviews and publish both the paper and the code that was mention was Computer Physics Communications. CPC had been doing this for over 15 years and we need more journals to implement the policy.
The outcomes of the Town Hall Meeting provided a number of new topics and leads to follow up. Most importantly, the session showed that exchanging ideas and sharing experiences helps to learn how to address the issues related to software and research.