Software and research: the Institute's Blog

Building a bridge between a virtual machine and the outside world


By Mike Jackson, Software Architect.

The Distance project at the University of St. Andrews use Windows XP virtual machines for developing their Distance for Windows software. Their interface code, implemented in Visual Basic, is not held under revision control and institutional security policies mean that their XP virtual machines cannot be connected to the network.

In this blog post, I describe my experiences of using Git and shared folders to address both these problems, as part of our recent open call collaboration.

Scientific coding and software engineering: what's the difference?

By Daisie Huang, Software Engineer, Dryad Digital Repository.

What differentiates scientific coders from research software engineers? Scientists tend to be data-discoverers: they view data as a monolithic chunk to be examined and explore it at a fairly fine scale. Research Software Engineers (and software engineers in general) tend to figure out the goal first and then build a machine to do it well. In order for scientists to fully leverage the discoveries of their predecessors, software engineers are needed to automate and simplify the tasks that scientists already know how to do.

Scientists want to explore. Engineers want to build

I've been thinking a lot about the role of coding in science. As a software engineer turned scientist, my research is extremely computational in nature: I work with genomes, which are really just long character strings with biological properties. My work depends on software developed by myself and many, many other scientists. Scientists are, by and large, inquisitive and intelligent people who are fast learners and can quickly pick up new skills, so it seems natural that many would teach themselves programming. When I first started talking to scientist-coders, I thought that perhaps I could relate to them from a programming perspective, and maybe bring some experience in formal software design practices to teaching scientists about coding. I started working with Software Carpentry and organisations of computational scientists in my field (Phylotastic, Open Tree of Life,  Mesquite) and getting more involved in figuring out what motivates scientists to take time out of their research and learn to code.

An introduction to CGAT

By Andreas Heger, CGAT Technical Director.

Today, biologists have access to high-throughput measurement techniques that can assay many variables or entities at the same time. One striking example has been the advent of massively parallel sequencing techniques in the form of next-generation sequencing (NGS).

While the sequencing of the human genome took more than ten years and cost billions of pounds just a decade ago, a researcher can now send off material to a sequencing service and expect the equivalent of multiple human genomes worth of data within a few weeks and for not much more than the cost of a typical experiment. Unfortunately, few biologists are trained to best deal with the handling and statistical issues of the resultant large data sets.

Adopting automated testing

OK sign

By Mike Jackson, Software Architect.

Automated tests provide a way to check that research software both produces scientifically-valid results and that it continues to do so if it is extended, refactored, optimised or tidied. Yet one challenge that can face researchers, especially those with large, legacy codes, is this - where to start?

The prospect of having to write dozens of unit tests can be off-putting at the best of times, let alone if one has a data to analyse, a paper to write or a conference to prepare for. Our new guide on Adopting automated testing describes an approach for introducing tests by focusing on introducing end-to-end, or system tests first.

Automated GUI testing with AutoHotKey

Robot in front of a screen

By Mike Jackson, Software Architect.

As part of a recent open call collaboration with the Distance project at the University of St. Andrews, I was asked about open source tools to automatically test GUI-based applications on Windows.

By coincidence, an EPCC colleague had recently asked me the same question. So, I hit Google and Wikipedia, tracked down some candidates and decided to try the free open source AutoHotKey toolkit. In this blog post, I describe my experiences with this "scriptable desktop automation" tool.

Freeware and open source GUI test tools

Wikipedia lists a number of open source GUI test tools and Google revealed a couple of others. However, only a few of these are free or open source:

Project funding and economical sustainability in historical research

By Adam Crymble, Institute Fellow 2013

This is the first in a series of articles by the Institute's Fellows, each covering an area of interest that relates directly both to their own work and the wider issue of software's role in research.

If the Internet went down all historical software would cease to function, except for Microsoft Word. For an academic historian, a grant to build a high profile web-based project is likely the biggest pot of money he or she will ever receive during their career. That is, if they ever receive it as few historians will even apply. Instead, most are content to work in a fashion relatively similar to the way they did before the Internet came along. They go to the archives, read books and manuscripts, and write up their findings. This is their tried and tested mode of research, with costs limited to a few new books now and again, a train ticket or two to get to the archives, and refreshments while they're there.

Historical research is still largely a solo intellectual pursuit rather than a technical team-based one. There is nothing wrong with that. Not all discovery needs to be expensive, and as a tax-payer, I find it refreshing that there are still corners of the academic world in which spending more money isn't the easiest way to career progression. For the ambitious few who rise to the challenge and put in a proposal, meanwhile, the website that results, and in some cases the hundreds of thousands of pounds of funding that come with it, have made project leaders celebrities within the field. This celebrity comes with it all the accolades and resentment one might expect from fame.

A currency for peer review: PubCreds and Academic Karma

By Lachlan Coin, Academic Karma and Associate Professor, University of Queensland.

In 2010, Jeremy Fox and Owen Petchey proposed an innovative idea – fix peer review by introducing a peer review currency, which they called PubCreds[1]. Fox and Petchey noted that peer review suffers from a tragedy of the commons , in which "individuals have every incentive to exploit the reviewer commons by submitting manuscripts, but little or no incentive to contribute reviews. The result is a system increasingly dominated by cheats (individuals who submit papers without doing proportionate reviewing), with increasingly random and potentially biased results as more and more manuscripts are rejected without external review." Their solution was to privatise the commons by introducing a currency which is earned by reviewing and spent by getting reviewed.

Symptoms of the tragedy of commons in peer review

One of the main symptoms is slowing down communication of science. Fox and Petchey describe other symptoms, including an increasing tendency for journals to peer review only a small fraction of papers received, resulting in greater randomness in what eventually gets published. Another symptom is editors inviting many more reviewers than necessary in order to secure the minimum number necessary (anecdotally ~5x as many).

Feedback from Oxford Software Carpentry

By Philip Fowler, Software Sustainability Institute Fellow and postdoctoral researcher at the Department of Biochemistry at the University of Oxford.

Republished from the original post on Phil's blog.

The Wellcome Trust Centre for Human Genetics at the University of Oxford hosted its first Software Carpentry workshop this January. So how did the workshop go? I’m a bit biased, so to get a better idea I sent the participants a similar questionnaire to the one I sent to the Software Carpentry workshop I organised previously.

Open licences for people in a hurry (again)

Lindat license selector interface

By Mike Jackson, Software Architect.

Back in January, I blogged about tl;drLegal, an online resource to help us choose a suitable open-source licence. In the same spirit, the Institute of Formal and Applied Linguistics at Charles University in Prague provide the Lindat license selector​ to help select open licenses for both software or data.

Through a short set of questions, the Lindat license selector can help guide you to a license that both meets your software and data sharing requirements while satisfying any existing constraints on any software or data you have exploited.

The World beneath our feet

By Kristian Strutt, Experimental Officer at the University of Southampton, and Dean Goodman, Geophysicist at the Geophysical Archaeometry Laboratory, UC Santa Barbara.

This article is part of our series: a day in the software life, in which we ask researchers from all disciplines to discuss the tools that make their research possible.

Archaeological practice in the field seems so down to earth. The daily routine of excavation, recording of stratigraphy, finds and contexts, and understanding the different formation processes – it is what we are, and what we do. 

However, it is easy to overlook the scientific aspects of our work that integrate with the development of how archaeology understands past human activity.