Data Management Plan

Read our plan to manage the data collected and produced by the Software Sustainability Institute. 

Data falling from top to bottom of screen

Data Collection

What data will you collect or create?

We will be collecting:

  • data required to operate the institute (e.g. registration information for events, personal information relating to applications to Fellowships and Open Calls) data required to evaluate the performance of the Institute (e.g. equal opportunities related data, feedback on events and interactions)
  • data required to conduct research (.e.g types of software used, grants data) We will be creating:
  • aggregated data sets summarising data collected above software and tools relating to improvement of software

How will the data be collected or created?

We will be collecting data using:

  • Forms and surveys
  • Text and Data mining of existing data sources Analysis of existing data sources
  • Software development
  • Documentation and Metadata

What documentation and metadata will accompany the data?

All data will be published in Zenodo and institutional repositories, with information on its purpose and how it was generated.

Additionally an accompanying paper, technical report or blog post will be created to give an example of the use of the data.

Ethics and Legal Compliance

How will you manage any ethical issues?

The project will include data collected by survey from persons representative of the research community and research software community. This is required to provide evidence for the impact of research software in the difference research communities, such that policy and guidance decisions can be implemented.

We do not plan to have equal numbers of male and female participants as this would be un-representative of the current demographics of certain communities we will survey. However we will be considering gender and diversity issues as part of these studies.

Personal information will be collected in the form of names, email addresses, and information about the participants job role and organisation the participant works for. This data will be anonymised and unlinked before analysis. Research participants will not be identifiable, though we are aware that this may require some data to binned in the event that a participant is the only person in that role at that organisation. In this case, their data will only be used as part of aggregated, depersonalised data sets.

These studies will be passed by the ethics committees of the participating project partner organisations. Similar studies by members of the project consortium have previously been approved by the ethics committees.

How will you manage copyright and Intellectual Property Rights (IPR) issues?

All data and software created by the Institute will be released under open licenses to encourage reuse by the community. The default license will be Creative Commons Attribution (CC BY) for data and BSD 3-Clause license for software.

The exception will be where work involves background IP with additional restrictions (for instance, if we undertake consultancy for a group). In this case, work done by us will typically be released under terms agreed with the other organisations.

Storage and Backup

How will the data be stored and backed up during the research?

Non-sensitive data is primarily stored in Google Drive and Github, and backed up on local disks and to Edinburgh's DataStore facility.

Sensitive data is analysed on encrypted, password-secured laptops and desktops, and backed up to institutional repositories at Edinburgh, Manchester and Southampton (depending on where the analysis is taking place).

How will you manage access and security?

All data is stored or processed on machines which are password protected, restricting access to authorised users only, and with encrypted storage in case of theft.

Selection and Preservation

Which data are of long-term value and should be retained, shared, and/or preserved?

Data and software related to published papers will be preserved.

Additionally, data expected to have high reuse value (such as data collected in surveys of the community) will be shared and preserved.

What is the long-term preservation plan for the dataset?

Data selected for preservation will be deposited in Zenodo, as well as in institutional repositories.

Data Sharing

How will you share the data?

Data selected for sharing will be deposited in Zenodo, as well as in institutional repositories. The DOI issued will be published in related papers and articles.

Are any restrictions on data sharing required?

In general, there are no restrictions on data sharing, and sharing will be done under open licenses (CC BY for data and BSD 3-clause for software).

The exceptions are:

personal information about human subjects (e.g. through surveys): we will only share aggregated, anonymised and unlinked data
data owned by others (e.g. data from commercial organisations we have an agreement to analyse): where possible, we will publish synthetic / example data alongside the code and scripts used to analyse the data.

Responsibilities and Resources
Who will be responsible for data management?

The PI, Neil Chue Hong, is primarily responsible for data management and can be contacted by email.

What resources will you require to deliver your plan?

We will use existing institutional resources for data storage and preservation, as well as third-party infrastructure for storage, sharing and preservation of non-sensitive data such as Zenodo, GitHub and FigShare.