Data Management Plan

Plan Overview

Title: The Software Sustainability Institute: Phase 4

Creator: Neil Chue Hong

Principal Investigator: Neil Chue Hong

Data Manager: Kirsty Pringle

Project Administrator: Kirsty Pringle

Contributor: Selina Aragon, Caroline Jay, Simon Hettrick, Kirsty Pringle

Affiliation: University of Edinburgh

Funder: UKRI Future Leaders Fellowships

Template: UKRI Future Leaders Fellowships Template for a Data Management Plan

ORCID iD: 0000-0002-8876-7606

ID: 139006

Start date: 01-06-2024

End date: 31-05-2028

Last modified: 08-01-2024

The Software Sustainability Institute: Phase 4

0. Proposal name

0. Enter the proposal name

The Software Sustainability Institute - Phase 4

1. Description of the data

1.1 Type of study

SSI-4 will include the following studies:

Studies into the research software community (e.g. software used, training required, barriers to entry)
Conducting collaborative research with participant researchers into effective EDIA (equity, diversity, inclusion and accessibility) interventions
Analysing the effective of SSI-4 programmes to support those working with research software (including the SSI Fellowship Programme, and the Software Funding Pilot).

Additionally, we will collect data that is not part of a specific research study, but necessary for the pursuit of the objectives of the Software Sustainability Institute.

1.2 Types of data

We will be collecting:

Data required to conduct research studies (e.g. survey and interview data, data on software development from repositories)
Aggregated data sets summarising research into software and tools relating to improvement of software (e.g. types of software used, grants data)
Data required to operate the institute (e.g. registration information for events, personal information relating to applications to Fellowships and Open Calls)
Data required to administer financial programmes of the Institute (e.g. expense claims)
Data required to evaluate the performance of the Institute (e.g. equal opportunities related data, feedback on events and interactions)

1.3 Format and scale of the data

The data generated and utilised in our project will primarily be in digital formats, including code repositories, survey data, datasets, documentation, and associated metadata. Interview data will be transcribed and coded, to provide a machine readable form in addition to the original recordings. Standard file formats such as CSV and plain text will be employed to ensure accessibility, interoperability and long-term validity of the data.

The scale of the data will vary depending on the project components. We anticipate working with moderate-sized datasets, ranging from kilobytes to gigabytes. Provisions will be made for scalable storage solutions as the project progresses.

2. Data collection / generation

2.1 Methodologies for data collection / generation

We will be collecting and generating data using:

Forms, interviews and surveys
Text and Data mining of existing data sources
Analysis of existing data sources
Software development
Documentation and Metadata

We will be adhering to the FAIR principles for data and software.

2.2 Data quality and standards

To maintain data quality, rigorous validation and testing procedures will be implemented during the data gathering and processing phases. Data reviews and audits will be conducted as required, and any anomalies or errors in the data will be corrected.

We will adhere to community and industry standards for software development and data representation. This includes using version control systems, adhering to coding standards, and employing metadata standards to enhance data interoperability.

3. Data management, documentation and curation

3.1 Managing, storing and curating data

Non-sensitive data is primarily stored in Google Drive and Github, and backed up to Edinburgh's DataStore facility.
Sensitive data is analysed on encrypted, password-secured laptops and desktops, and backed up to institutional repositories at Edinburgh, Manchester and Southampton (depending on where the analysis is taking place).

All data is stored or processed on machines which are password protected, restricting access to authorised users only, and with encrypted storage in case of theft.

Rigorous data management practice will be ensured through regular backups, version control, and documentation of data formats and structures. A designated research data manager will oversee the long-term RDM strategy, including migration plans for changing technologies, and identification of operational data to destroyed after the retention period is complete.

3.2 Metadata standards and data documentation

Following UKRI open data policies, the SSI will record and make metadata available and discoverable to other researchers in a way that helps them to understand the research and reuse potential of the data.

A metadata record will be created in the University of Edinburgh's research outcomes system, PURE, ideally at the time of collection / generation of the data / software or, at the latest, by the time of publishing.

Published results will always include information about how to access the supporting data and software. Appropriate metadata will be added to data and software published in Zenodo and institutional repositories, with information on its purpose and how it was generated.

3.3 Data preservation strategy and standards

Data of long-term value (such as data and software related to papers published by the SSI, data collected by surveys of the community, and aggregated EDIA data) will be preserved through deposition in Zenodo, as well as in institutional repositories, along with information on its purpose and how it was generated.

Where appropriate, an accompanying paper, technical report or blog post will be created to give an example of the use of the data.

4. Data security and confidentiality of potentially disclosive information

4.1 Formal information/data security standards

The services operated by EPCC (the lead department at the lead partner for the SSI) are accredited for information security and quality management - this includes the Edinburgh International Data Facility which is proposed to be used for some studies.

ISO 27001: Certificate number #276767-2018-AIS-GBR-UKAS
Cyber Essentials: Organisation name - EPCC

4.2 Main risks to data security

Potential risks to data security include unauthorized access, data breaches, and loss due to technical failures. Mitigation measures will involve implementing access controls, encryption protocols, regular security audits, and collaboration with institutional IT security teams.

The main sources of personal data are from interviews and surveys, and from information provided by participants in our events and programmes. We will ensure that access is limited to those who require it, and that all members of the team have had appropriate information security training.

5. Data sharing and access

5.1 Suitability for sharing

In general, there are no restrictions on data sharing, and we will utilise open standards and licenses to facilitate (CC BY for data and BSD 3-clause for software) to facilitate sharing, ensuring that the outputs are accessible to the broader community.

The exceptions are:

personal information about human subjects (e.g. through surveys): we will only share aggregated, anonymised and unlinked data
data owned by others (e.g. data from commercial organisations we have an agreement to analyse): where possible, we will publish synthetic / example data alongside the code and scripts used to analyse the data.
certain operational and financial data.

We will primarily be sharing through Zenodo and Edinburgh DataShare.

5.2 Discovery by potential users of the research/innovation data

To enhance discoverability of the open data, comprehensive metadata and documentation will accompany the data. This includes clear descriptions of the data's purpose, methodology, and potential use cases. We will follow the University of Edinburgh's commitment to FAIR data sharing, as well as the FAIR principles for research software.

5.3 Governance of access

All data and software being shared will be licensed under open licenses therefore there are no restrictions to access once deposited in a repository.

It is the PI's ultimate responsibility to determine which data and software are being shared.

5.4 The study team’s exclusive use of the data

It may be important for the study team to have exclusive use of certain data during specific phases of the project (e.g. for analysis).

Where appropriate this data will be clearly identified and segregated from publicly accessible data. Access controls will be implemented to limit usage to authorised team members, promoting an environment conducive to focused research and development.

The exclusive use period will be clearly defined and linked to project milestones. Upon completion of the exclusive use phase, the data will be transitioned to broader access categories, as outlined in the project's data access policy.

Upon the conclusion of the exclusive use phase, protocols for releasing the data to wider audiences, including the broader research community or the public, will be followed. These protocols will include necessary documentation, versioning information, and considerations for ensuring a smooth transition from exclusive use to broader accessibility.

5.5 Restrictions or delays to sharing, with planned actions to limit such restrictions

Certain conditions may necessitate restrictions or delays in the sharing of specific data. These conditions could include contractual obligations, intellectual property considerations, ethical concerns, or regulatory requirements.

a. Contractual Obligations: In cases where contractual agreements dictate limitations on data sharing, the SSI will engage in proactive discussions with involved parties to negotiate terms that allow for the responsible release of data. The negotiation process will aim to align contractual requirements with the institute's commitment to open science and collaboration.

b. Intellectual Property Considerations: If intellectual property considerations or pending patent applications impact the immediate release of certain data, the SSI will explore options for sharing non-sensitive metadata, aggregated results, or summaries.

c. Ethical and Regulatory Compliance: In instances where ethical considerations or regulatory requirements demand careful handling of specific data, the SSI will adhere to established ethical guidelines and regulatory frameworks. The team will work to obtain necessary approvals, implement anonymisation strategies, or fulfil any prerequisite conditions to enable timely data sharing within ethical boundaries.

5.6 Regulation of responsibilities of users

In general, external users accessing open data and software produced by the SSI will not be bound by data sharing agreements.

Close collaborators, who require access to raw data, personal / confidential data, or uncurated generated data for the purposes of analysis on jointly undertaken studies shall be bound by a data sharing agreement or standard contractual clauses, outlining their responsibilities. They are expected to comply with terms, provide proper attribution, and adhere to ethical standards. The institute employs monitoring mechanisms, training initiatives, and legal review to ensure responsible data use. Continuous communication, user feedback, and periodic reviews contribute to a collaborative and accountable research environment.

6. Responsibilities

The PI, Neil Chue Hong, is primarily responsible for data management and can be contacted by email. Data management at the preservation and sharing lifecycle stages may be delegated to others in the team, e.g. for deposit and review of previous deposits, this will be the relevant project manager from the Project Office. Day to day responsibility for operational data is delegated to the Associate Director of Operations, Selina Aragon.

7. Relevant policies

7. Relevant institutional, departmental or study policies on data sharing and data security

Policy	URL or Reference
Data Management Policy & Procedures	https://information-services.ed.ac.uk/about/policies-and-regulations/research-data-policy
Data Protection Policy	https://data-protection.ed.ac.uk/data-protection-policy
Data Security Policy	https://infosec.ed.ac.uk/information-protection-policies/information-security-recommended-reading
Data Sharing Policy	https://information-services.ed.ac.uk/about/policies-and-regulations/research-data-policy
Institutional Information Policy	https://information-compliance.ed.ac.uk/freedom-information/published-information/information-compliance
DataShare service policies	https://library.ed.ac.uk/research-support/research-data-service/after/data-repository/service-policies

8. Author and contact details

8. Author of this Data Management Plan (Name) and, if different to that of the Principal Investigator, their telephone & email contact details

Neil Chue Hong, n.chuehong@epcc.ed.ac.uk

Kirsty Pringle, k.pringle@epcc.ed.ac.uk

Data Management Plan

Plan Overview

The Software Sustainability Institute: Phase 4

0. Proposal name

0. Enter the proposal name

1. Description of the data

1.1 Type of study

1.2 Types of data

1.3 Format and scale of the data

2. Data collection / generation

2.1 Methodologies for data collection / generation

2.2 Data quality and standards

3. Data management, documentation and curation

3.1 Managing, storing and curating data

3.2 Metadata standards and data documentation

3.3 Data preservation strategy and standards

4. Data security and confidentiality of potentially disclosive information

4.1 Formal information/data security standards

4.2 Main risks to data security

5. Data sharing and access

5.1 Suitability for sharing

5.2 Discovery by potential users of the research/innovation data

5.3 Governance of access

5.4 The study team’s exclusive use of the data

5.5 Restrictions or delays to sharing, with planned actions to limit such restrictions

5.6 Regulation of responsibilities of users

6. Responsibilities

6. Responsibilities

7. Relevant policies

7. Relevant institutional, departmental or study policies on data sharing and data security

Policy

URL or Reference

8. Author and contact details

8. Author of this Data Management Plan (Name) and, if different to that of the Principal Investigator, their telephone & email contact details