Software Deposit and Preservation Workshop

Posted by s.aragon on 24 July 2018 - 9:45am

By Mike Jackson, Software Architect, The Software Sustainability Institute

On the 11th July, the Software Sustainability Institute and Jisc ran a Software Deposit and Preservation Workshop at St. Anne’s College, Oxford. This workshop brought together 12 research data managers, digital repository vendors, publishers, policymakers and researchers. We reviewed draft guidance on software deposit and preservation, discussed software deposit and preservation from the perspectives of the foregoing stakeholders, and explored ways in which to drive forward the adoption of best practices in software deposit and preservation.

Jisc’s Research Data Shared Service

The workshop was part of an activity, funded by Jisc, to provide software deposit and preservation guidance, in particular to develop use cases and workflows for software deposit via Jisc's Research Data Shared Service (RDSS). This workshop was a follow-up to our March Software Deposit and Preservation Policy and Planning Workshop. That workshop brought together Jisc Research Data Shared Service (RDSS) pilot institutions, and others interested in research software preservation, to provide a better understanding on the current practices, policy and guidance in this area.

Christopher Brown of Jisc gave an introduction to, and update on, the current status of Jisc’s Research Data Shared Service, a service to allow researchers and institutions to meet their policy requirements for the deposit and curation of research data. Complementing the RDSS is the Research Data Management Toolkit, an information resource for researchers, research support staff and IT specialists.

Guidance on software deposit

From the outputs of the first workshop, we had drafted a collection of guides on software deposit for researchers, research leaders and research data managers. The guides provide advice on different aspects of depositing software into digital repositories: why researchers should deposit their software into a digital repository; when they should deposit their software; where they could deposit their software and how to choose a digital repository; how they should deposit their software; what should (and should not) be part of a software deposit; how to describe a software deposit (its metadata); how to choose a software licence for a deposit; and how to review a software deposit.

I presented a high-level overview of this guidance to the attendees, who provided valuable feedback, comments and suggestions, which are now being used to revise this guidance. Once complete, version 1.0 of this guidance will be published this later summer, both in Zenodo and online (keep an eye on this blog and @softwaresaved on Twitter).

After publication, we’ll source, from the research software community, examples of what are considered to be good software deposits, and good research software generally, and why these are considered to be good. We will curate these as a resource to complement the guidance.

One approach to sourcing examples, suggested by Naomi Penfold, is to use hypothes.is tags to annotate such examples on the web (see, for example, Naomi’s reproducible-research-showcase and software-deposit tags).

Researchers willing to use the guidance and see how it works in practice will also be sought. Their experiences, and feedback from the research software community more generally, will be used to update the guidance with the intent that an updated version will be published in Summer 2019, possibly as a Research Data Alliance Software Source Code Interest Group output.

Perspectives on software deposit and preservation

Four invited speakers, representing different stakeholder groups, presented their perspectives on software deposit and preservation, its challenges and its opportunities.

Naomi Penfold of eLife gave a presentation on Research software preservation: a publisher's perspective, sharing what eLife have learnt so far about software citation in research, how they are working to preserve research software, and their requirements for software preservation workflows and initiatives.

Roberto Di Cosmo of Software Heritage gave an introduction to their universal source code archive, an ambitious international initiative to provide a universal archive and reference system for all (yes, all!) software, which is constantly evolving to harvest ever more software from an ever expanding number of sources. Roberto also introduced the new functionality to deposit source code into the Software Heritage archive already available through hal.inria.fr (the Inria-dedicated portal on top of HAL, the French national open access archive), and that will be generalised to all of France this fall.

Federica Fina of the University of St. Andrews gave a presentation on Software deposit at the University of St Andrews, describing research data management at St. Andrews, the challenges and questions that are faced, and how, in collaboration with their Research Computing team, they advise their researchers.

Finally, Antonis Lempesis of the Athena Research and Innovation Centre presented OpenAIRE’s guidelines for research software. The OpenAIRE Guidelines for Software Repository Managers 1.0 provide orientation for software repository managers to define and implement their local software management policies in exposing metadata for software products.

Discussions

The attendees discussed blockers to software deposit and preservation and how to overcome these obstacles, as well as more general issues around assessing the quality of software both deposited into digital repositories and submitted to publishers. Key points from the discussion are as follows.

Blockers to software deposit

Blockers can include lack of time, lack of inclination, a feeling that research software is “not yet good enough” to be made public, or a concern that others’ might use it (and so publish results derived from using the software before its authors).

Version control and software deposit

Researchers developing software should be strongly encouraged to use version control. However, there should be no requirement that researchers must use version control before being able to deposit their software into a digital repository, as this imposes an additional hurdle for researchers who don’t currently use these tools. Guidance, and tools, should accommodate these researchers too.

Software quality and reviewing

If software is not deposited or published, then no-one can find out whether or not it produces incorrect results. Software that has been deposited is not necessarily good or scientifically correct, but it does allow others to check whether or not it is. There should be an expectation that research software will, and should be, be shared and deposited.

Publishers’ concerns about publications being “right” or “correct” leads to concerns about any associated software being “correct”. Reviewing can help mitigate these concerns by providing a reassurance that at least two people can get the same result from the same software with the same data in the same environment. However, repetition of exact results may not always be appropriate or easy (for example, if the software includes random number generation).

In 2011, the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) experimented with artefact evaluation. allowing authors to submit for review not just their paper, but complementary artifacts, including software. The reviewers checked for consistency with the paper, completeness, quality of the documentation and ease of reuse. Badges were awarded to artefacts deemed to be of high quality. The goal was to check whether an artifact matches expectations set up by its paper. This evolved into an ACM policy for Artifact Review and Badging, for use by ACM conferences and journals.

Publishers such as the ACM and journals such as the Journal of Open Research Software have reviewers who review software submitted to their publications. It would be useful to explore whether those willing to act as reviewers for publishers could be brought together with research data managers to offer a similar software review capability for digital repositories.

Software reviewing by publishers or digital repository managers can be potentially both costly and time-consuming and may incur the need to engage the services of research software engineers.

A complementary approach is to promote software reviews before submission, to be arranged by the researchers themselves. Software reviews are not only good software development practice but good research practice. Large research teams may use standard software development practices including reviews and release management processes. Solo PhD students or other researchers may not as they lack the time or expertise to do so. Providing such researchers with guidance and support for reviewing and releasing their software may help to improve the quality of their software, improve their software development skills, and persuade them that their software is “good enough” to be published. One low cost approach is to encourage such researchers to discuss with a peer what their code does, how it does it and to show it in action. This can serve as an informal form of peer/code review.

Promoting best practice and measuring cultural shifts

Adoption of best practice in one go cannot be done. There is a need to encourage researchers to make small, incremental, changes (as improvements) in their behaviours. Once researchers realise that adopting good practices is less challenging than feared, and recognise their intrinsic value, then a cultural shift will occur.

Identifying such cultural shifts may be anecdotal, for example a feeling that, over time, the nature, and quality of research software, is improving. Repository managers and research data managers reported ways they measure adoption: noting how often researchers ask a question about a process or a requirement, or how often a practice is followed, for example.

Communities can, and should, evolve at their own pace.

Acknowledgements

Thank you to Jisc for funding this event as part of the Jisc RDSS Software Deposit and Preservation Project, to our four invited speakers, to all our attendees for their valuable contributions during the workshop and to this blog post, and to my Institute colleagues Clem Hadfield and Graeme Smith for administering the workshop.