CW20 speed blog: How to engage Research Group Leaders in sustainable software practices

Posted by g.law on 27 May 2020 - 8:29am
carpentry tools on a wall
Photo by Barn Images on Unsplash

By Emma Rand (editor), Will Furnass, Sam Mangham, Marion Weinzierl, Tarek AllamCatherine Smith and Pablo Bernabeu

This post is part of the CW20 speed blog posts series.

There is an increasing number of training courses introducing early career researchers to sustainable software practices but relatively little aimed at Research Group Leaders and Principal Investigators. Expecting group leaders to personally acquire such skills through training such as a two-day Carpentries workshop is unrealistic, as these require a significant time investment and are less directly applicable in the role of research director. In addition, many group leaders would not consider their group as outputting software, or are less aware of the full range of benefits that sustainable practice brings and will thus be less likely to signpost such training to their team members. Even where they do identify benefits, they may have concerns about releasing group software or may feel overwhelmed by the potential scale of the task, especially with respect to legacy projects.

The important role that Research Group Leaders have in influencing the overall shape of projects and the research culture in their teams makes them worth specifically targeting. Research Software Engineers are well placed to provide training but it can be difficult to identify and articulate the most compelling arguments for sustainable practices for research leaders from diverse disciplines. Demonstrating how time invested in practices such as version control, documentation and testing can pay dividends in the long term is required to make the case for the inclusion of sustainable practices in projects from the start.

We suggest using five prompting questions to communicate to research leaders what they have to gain by leading their group to sustainable practices, or to lose by not doing so.

  1. How long would it take you to combine the data from two figures into one in response to a reviewer’s comments?

    • Is it time consuming to revise previous analyses and visualisations? Do general data processing scripts exist that can be applied to a new data set, and are those scripts future-proof?

  2. How do you induct a new PhD student or postdoc into the data and software management practices of your lab?

    • Is there any induction? Does induction happen organically and take a long time?

  3. How many people need to leave before you can’t (re)run an analysis? 

  4. How can we access your code, and how are new contributions merged in?  

    • Access and contribution are best facilitated by online repositories such as GitHub (along with other options such as OSF, Figshare, etc.). Crucially, these repositories incorporate a version control system which allows contributors and the public to view the complete history of changes, and access any previous versions of the documents.

  5. How do you make sure that a change in one part of the code does not break any of its functionality? 

    • In the medium term, learning how to set up unit testing checkpoints may pay off.

To promote the implementation of sustainability in research groups, we devised a storyboard at the SSI Collaborations Workshop 2020 that would be specifically targeted at group leaders. Watch the concept video.

Whilst research leaders are key targets, it’s not feasible or appropriate for them to get an in-depth understanding of all the tools and techniques required for sustainability as their time spent doing front-line research is constrained. In addition, when many projects have the same needs for sustainable development, leaders shouldn’t need to invent their own procedures. We propose developing a toolkit to help them implement sustainable software development in their own groups. They need:

  • Checklists to facilitate process audits, induct new group members, and determine what needs documenting before a researcher leaves. 

  • A glossary of key terms for sustainable software practices which includes a rationale for inclusion in grant proposals.

  • A concise set of training opportunities and resources that principal investigators can signpost to researchers (e.g. the StackExchange suite, or more specific forums such as RStudio Community).

  • A handbook template to document group processes (including checklists, processes for collaborating using version control tooling, information regarding data schemas/sources, etc).

  • Examples that demonstrate value but are also achievable.

One example protocol containing many of the above criteria was recently made public by a social sciences research group (Maquate, Schliewe, & Knoeferle, 2020). The document outlines the common project workflow, safety and security guidelines, the code of conduct and important resources. In addition, it details the individuals responsible for specific activities.

We should all remind ourselves that if group leaders do not operate sustainable software practices, it is not through bad intention, but through insufficient training and experience to fully appreciate its value. It is essential that research software engineers make it as easy as possible for group leaders to acquire the minimum necessary knowledge to lead their teams into sustainable software practices and fully reproducible research. In addition, we need to educate and support them to recognise the costs in terms of staffing and resources required to ensure this essential activity to enable them to incorporate it into grant proposals.


Want to discuss this post with us? Send us an email or contact us on Twitter @SoftwareSaved.