CW19 Mini-workshops and Demos Sessions

Mini-workshops and demos sessions will give an in depth look at a particular tool or approach and a chance to query developers and experts about how this might apply to attendees areas of work.

Here are the provisional list of mini-workshops and demos that will take place at CW19. More sessions will be added before the CW19.

Attendees are encouraged to ask questions and enquire about how they could use the approach, tools etc - these sessions have a focus on being interactive.

Session 1

On Monday 1 April 2019, 15:50 - 16:40.

Tech Writing 101 (Part 1)

Speaker: Sarah Maddox, Google and Sharif Salah, Google.

Room: WPT.0.06

Operations manuals, design docs, code comments, email messages, post-mortems. Writing things down for others is unavoidable in the life of a researcher, an engineer, a UX designer, a product manager, and, of course, a tech writer.

This workshop leads attendees through a series of pair-work exercises to improve the clarity, readability, and effectiveness of their writing. Attendees will learn from an experienced Google technical writer and from each other.

Gigantum: A technological force multiplier for the open science rebel alliance

Speaker: Dav Clark, Gigantum.

Room: WPT.0.08

Grassroots change is most effective when coupled with a scaling technology. Applying this maxim to efforts promoting accessibility and meaningful reproducibility suggests that we need scaling technologies to increase the number and diversity of researchers doing high quality collaborative and open science. Since science is increasingly code, we should try and model examples such as Github’s development of their pull request UI and the subsequent explosion of collaboration on code. We look to such examples in software development to inform our own streamlining of complex tools and workflows to enable a similar explosion of collaboration amongst a diverse set of researchers.

This workshop will introduce participants to Gigantum, an approach intended to scale reproducible and collaborative data science by supporting individual users. We’ll first introduce the Gigantum Client, an MIT licensed web application that runs locally to simplify and automate the application of tools like Docker and Git while integrating with environments like JupyterLab. We’ll then describe services hosted by Gigantum that enable single-click publication and collaboration from the Client. Participants will learn about versioning and collaboration features, as well as new approaches to creating and managing scientific datasets.  We’ll also show how sophisticated users (e.g., Research Software Engineers, Data Librarians, etc.) can go under the hood to create customized data science environments that are easily distributed and accessible to diverse users with a broad spectrum of skills. Finally, time permitting, there will be a demonstration of how to create portable and reproducible GPU based workflows.

We encourage participants to install Docker CE locally, but there will be a cloud-based instance for users who need it, along with an extra team member on hand to answer questions or help with troubleshooting.

Please install Gigantum in advance. https://gigantum.com/download will guide you to:

  • Install Docker CE on your machine
  • Install a Gigantum launcher (desktop by default, but there’s a CLI)
  • Download and launch the initial Gigantum Client container

If this doesn’t work out https://try.gigantum.com. In either case, create an account and log in!

Towards a production-ready solution for reproducible articles

Speaker: Emmy Tsang, eLife.

Room: WPT.0.09

A main goal of the eLife Innovation Initiative is to support the development of technology and process innovations that encourage and recognise the most responsible behaviours in science, which includes the active sharing of research data and methods. Over the past two years, together with Substance and Stencila, we have been developing an open technology stack that will enable researchers to publish reproducible manuscripts through online journals (the Reproducible Document Stack, RDS). The motivation of the RDS came from recognising one of the key problems with the current established format of research articles: research methods are not generally described in sufficient detail to allow other researchers to faithfully repeat key experiments. This is in part due to the increasing complexity of modern research, in particular in its computational methods.  

In this session, we wish to showcase the first computationally reproducible article published on eLife, based on a paper written by Tim Errington, Director of Research at the Centre for Open Science. I will demonstrate how a researcher can create a reproducible article with Stencila Desktop, an open-source, easy-to-use manuscript editor with combines the traditional authoring workflows of Word and Excel with the ability to embed R and Python code blocks that can analyse table data and generate live interactive plots. The reproducible article can be viewed, edited and executed from within a web browser; plots can be updated live by re-running the embedded code. I will discuss the underlying technologies of the reproducible article, and our future plans to develop RDS into a scalable, production-ready solution. We invite feedback from the community on the project’s potential functionalities and development roadmap.

Session 2

On Monday 1 April 2019, 16:50 - 17:40.

Tech Writing 101 (Part 2)

You must have attend Tech Writing 101 (Part 1).

Speaker: Sarah Maddox, Google and Sharif Salah, Google.

Room: WPT.0.06

Operations manuals, design docs, code comments, email messages, post-mortems. Writing things down for others is unavoidable in the life of a researcher, an engineer, a UX designer, a product manager, and, of course, a tech writer.

This workshop leads attendees through a series of pair-work exercises to improve the clarity, readability, and effectiveness of their writing. Attendees will learn from an experienced Google technical writer and from each other.

Adding CI and automated testing to your project, using the ANVIL service

Speaker: Ilektra-Athanasia Christidi, University College London.

Room: WPT.0.08

The value of CI and automated testing when developing research software is a well accepted fact. But making it a reality is a different story: setting up travis on GitHub is easy but severely limited if one needs to use proprietary software or compilers, test on Windows, or test code that uses MPI. A local Jenkins server can be setup to cover all the cases one needs to test, but poses a considerable burdain on the team or person maintaining it. Jenkins solutions on the cloud are emerging, and probably require considerable amounts of time and fairydust to setup.

Enter ANVIL, the EPSRC and STFC's centralised testing service for the UK academic community! Where maintained, well-integrated Jenkins servers are freely available for all your CI and testing needs. Too good to be true? We will all find out during this demo, in which I'll attempt to setup automated testing for a new project, using a proprietary compiler and MPI. Safety goggles optional.

Ontologies and Interoperability

Speaker: Alexandra Simperler, Simperler Consulting/Goldbeck Consulting.

Room: WPT.0.09

This presentation is based on efforts by the European Materials Modelling Council (EMMC) and, thus, related to materials modelling software. They provide a case example which highlights several topics relevant to the software community in general.

Our community encounters basically two types of interoperability problems. (a) horizontal interoperability i.e. the use of different codes for a single materials case, and (b) vertical interoperability, i.e. transferring data between several codes and model types used to simulate the same material case.

An interoperability environment facilitates the required information exchange. Depending on scope, we distinguish several semantic interoperability levels including: scientific community level (enabling exchange between scientists), material user case level (representation of the material), and numerical level (code representations). An ontology such as the European Materials Modelling Ontology (EMMO) covers the semantic representations at different levels supporting human and eventually machine interoperability

This one hour’s intensive educational presentation is aimed to provide participants with an introduction to ontologies and scenarios for interoperability. The importance for a common language and a standardized communication between different levels and stakeholders will be highlighted.

Finally, it will be discussed, how EMMC supports efforts to achieve interoperability of materials models and by establishing open standards for the integration of different codes (e.g. academic and commercial, open and close source), referred to as the Open Simulation Platform (OSP).

The material is provided by the EMMC-CSA (www.emmc.info) partners Gerhard Goldbeck, Emanuele Ghedini, Adham Hashibon, Georg J. Schmitz, Jesper Friis

Session 3

On Tuesday 2 April 2019, 15:30 - 16:10.

Document all the things! (How again?)

Speaker: Stephan Druskat, Humboldt-Universität zu Berlin.

Room: WPT.0.05

Documentation is key in sustaining (research) software. How can we expect future users, developers and maintainers to run, use, contribute to or maintain software if they cannot refer to its documentation? While this point seems - almost tautologically - clear, the question of how to best document a software project remains unanswered, although there are ongoing efforts, including dedicated conferences such as "Write The Docs", that aim to provide answers. In this mini-workshop we will have a brief look at a specific aspect of documentation: documentation types. Based on a use case from a new research project working to sustain a research software, we will discuss requirements of documentation for research software in terms of: (1) What must or should be documented? What types of documentation should a research software project provide in order to survive and be re-usable? Is there a minimal set and a maximal set of required documentation types or levels? (2) How do the different types of required documentation determine how the documentation should actually be written (or generated)? (3) What are the requirements for tooling that supports the creation of the required documentation? Is such tooling available? In the course of the workshop we will try to find some answers to these questions, with the aim to generate the starting point for a document on the documentation of research software. We can then collaborate further on this document during the hack session to create an output (a blog post/whitepaper/publication) that will help other research software practitioners to configure documentation of their own projects respectively.

The Turing Way: A handbook for reproducible research

Speaker: Kirstie Whitaker, Alan Turing Institute.

Room: WPT.0.06

Reproducible research is necessary to ensure that scientific work can be trusted. Funders and publishers are beginning to require that publications include access to the underlying data and the analysis code. The goal is to ensure that all results can be independently verified and built upon in future work. This is sometimes easier said than done. Sharing these research outputs means understanding data management, library sciences, sofware development, and continuous integration techniques: skills that are not widely taught or expected of academic researchers and data scientists. The Turing Way is a handbook to support students, their supervisors, funders and journal editors in ensuring that reproducible data science is "too easy not to do". It includes training material on version control, analysis testing, and open and transparent communication with future users, and build on Turing Institute case studies and workshops. This project is openly developed and any and all questions, comments and recommendations are welcome at our GitHub repository: https://github.com/alan-turing-institute/the-turing-way. During this session lead developer of the project, Kirstie Whitaker, will lead a collaborative review of the content so far and show CW19 participants how they can contribute their knowledge to make it even better going forwards.

Archiving and Citing Research Software

Speaker: Daina Bouquin, Harvard-Smithsonian Center for Astrophysics.

Room: WPT.0.08

Access to research software is foundationally important to both the future
and the heritage of scientific research. Deep intellectual contributions are being made
by people building software that is increasingly unable to be decoupled from data itself. Digital research artifacts like code in particular present new challenges to traditional scholarly communication models and digital preservation practices as versioning and authorship in these contexts are fluid. It is nevertheless essential that scientists are encouraged to create these valuable resources and that practices are adopted to enable people to share these complex, distributed, changing tools as easily as they share articles. A brief overview of emerging best practices in this context will be presented along with concrete actions people can take to make their research software and data more open, citable, and persistent to ensure the legacy of their work.

Mapping Metadata: Human and Machine-Readable Ontologies for Digital Archives

Speaker: Melodee Beals, Loughborough University.

Room: WPT.0.09

This session will demonstrate one of the outputs of the Oceanic Exchanges Research Project. As part of our research into digitised newspaper databases, the research team carefully mapped the metadata schema of 10 different newspaper repositories in order to make their data (and our derived data) interoperable and allow for a multi-national and cross-lingual analysis. The session will discuss how we chose to map the different data sets using the documentation provided by digitising organisations as well as oral histories and archival research into how the metadata and descriptions for these object was original conceived. The result will be a computational humanities approach to integrating what researchers think a database is saying and what it actually encodes. What, for example, do you really mean by "title"?