Interoperability and its importance in research software

Posted by s.aragon on 26 October 2018 - 9:12am
hans-peter-gauster-252751-unsplash.jpg
Photo courtesy of Hans-Peter Gauster 

By Raniere Silva, Software Sustainability Institute.

Following up discussions around culture change and productivity at this year’s Collaborations Workshop (CW18), interoperability is one of Collaborations Workshop 2019 (CW19) themes.

The Oxford Dictionary defines interoperability as "the ability of computer systems or software to exchange and make use of information." Today, interoperability is a key factor for researchers, not only because it increases efficiency and accuracy, but also because it makes collaborative efforts more fluent. The following four examples make a case for the importance of interoperability in research and why we, here at the Institute, think we need to have a discussion about it at CW19.

Reference

In research, productivity is measured by the number of citations researchers get in other publications. It would make sense for anyone external to the world of academia to think we have an accurate way of counting citations. Unfortunately, there are many citation styles in use (APA, MLA, ASA, Chicago, and Vancouver) and sometimes these are inoperable; that is, one citation format can’t be translated into a different one. For instance:

Knuth, D. (1983). The TeXbook. Reading, Mass.: Addison-Wesley.

follows the APA style. However, it would not be easy to write this same citation following the Chicago style because the author’s full name isn’t displayed in the APA style example:

Knuth, Donald Ervin. 1983. The TeXbook. Reading, Mass.: Addison-Wesley.

APA citation also causes ambiguity in terms of the author, so how can we know for sure that it was Donald Ervin Knuth who wrote the book, rather than someone else with the same initial and surname (perhaps someone in his family)?

To work around some of the issues of citation formats, researchers have invested in the federation of Digital Object Identifier (DOI) that is not only important for citation, but for other applications (e.g., workflows).

File Format

There is an array of tools available, such as Microsoft Office Word, LibreOffice Writer, Google Docs, Atom, Vim, Emacs, to write articles, documentation for scripts, or notes for training sessions. Most of the time, the file format that is required to submit an article or publish script documentation, or print notes from the training session isn't the one that was originally used when creating the document. This requires tools to convert between file formats; for example, Microsoft Office Word to PDF, Python's Docstring to HTML, or R Markdown to EPUB.

The need to convert between file formats is not restricted when writing articles but is present in many parts of the research process. One researcher might use Microsoft Office Excel to write some data in a tabular format but convert the XSLX file to JSON to be analysed by a tool written in Javascript. Another researcher might download one collection of problems used to benchmark a family of algorithms but convert the problems because the input format that is supported by their implementation started decades before and didn't receive a patch to support the new input format.

As satirised in XKCD's comic "Standards", users will never have one format "to rule them all". Researchers will deal with different file formats and use tools to convert between each other as long as they are not interoperable.

Programming Languages

Libraries developed for a particular problem are one of the reasons why researchers choose one programming language over another. However, libraries should be designed and implemented to be interoperable with others. In other words, they should use a common data format; for example, in Python, developers should try to use NumPy when representing matrices.

And why not make two or more programming languages talk to each other? Python and R have powerful libraries driven by fantastic communities; making them interoperable would open up possibilities for collaboration. For example, reticulate is a "comprehensive set of tools for interoperability between Python and R" and allow you to mix Python and R – amazing for teams with complementary experience. What other tools do you know that are interoperable with other languages?

API

The concept of executing chain commands together is a practice as old as the computer itself – Wikipedia says that it "was championed by Douglas McIlroy at Unix's ancestral home of Bell Labs, during the development of Unix, shaping its toolbox philosophy." With the popularisation of cloud computing, chain commands that are executed in different machines is already needed, but for that to work we need online services to provide us with an application programming interface (API). For example, many interesting projects in the social sciences are based on data from social networks and are possible because some of those platforms offer API to researchers, e.g. Twitter. Some projects are related to real-time event detection such as earthquake, while others to sentiment and opinion analysis that, for example, might be able to predict elections result. What APIs would you need to enable your research?

Come and talk to us at CW19

In the previous four examples, we try to provide concrete examples of cases when interoperability is key for researchers but we know that these four examples don't cover all the landscape. Join us for Collaborations Workshop 2019 to discuss this and other examples. The event will take place from the 1st - 3rd April at Loughborough University.

If you wish to discuss this post with us, send us an email or contact us on Twitter @SoftwareSaved.