Software and research: the Institute's Blog

Getting to grips with EPSRC's policy on research data

By Neil Chue Hong, Director.

From 1 May 2015, organisations that receive EPSRC funding, and their researchers, are expected to comply with the EPSRC policy framework on research data. This sets out EPSRC’s principles and expectations concerning the management and provision of access to EPSRC-funded research data, in particular the principle that "research data is a public good produced in the public interest and should be made freely and openly available with as few restrictions as possible in a timely and responsible manner".

Archaeology with open-source software. It's getting easier

By Ben Marwick, Assistant Professor of Archaeology at the University of Washington.

This short post is written for archaeologists who frequently perform common data analysis and visualisation tasks in Excel, SPSS or similar commercial packages. It was motivated by my recent observations at the Society of American Archaeology meeting in San Francisco - the largest annual meeting of archaeologists in the world - where I noticed that the great majority of archaeologists use Excel and SPSS. I wrote this post to describe why those packages might not be the best choices, and explain what one good alternative might be. There’s nothing specifically about archaeology in here, so this post will likely to be relevant to researchers in the social sciences in general. It’s also cross-posted on the Arc-Team Open Research blog to celebrate the inclusion of RStudio in the next release of their custom Linux distribution for archaeologists.

Top tips for running a small workshop

By Stephen Eglen, Software Sustainability Institute Fellow and senior lecturer University of Cambridge.

Late last year, I ran a workshop with the International Neuroinformatics Coordinating Facility (INCF) in Cambridge. It was regarded by all attendees as a success and it was suggested that we archive some tips for organising a small workshop. Here are those tips.

1. Get help with admin

We were incredibly lucky in that all the administration for the event was taken care of by the INCF, and in particular its program officer, Mathew Abrams. Everyone's travel plans were coordinated, and everyone stayed at the same (beautiful) college. Good admin is a fundamental part of a successful event, but it takes a lot of time to do well, so take any help you can get to ensure that your admin is done well.

Going the Distance with natural abundance

By Alexander Hay, Policy & Communications Consultant, talking with Eric Rexstad, University of St. Andrews.

This article is part of our series: Breaking Software Barriers, in which we investigate how our Research Software Group has helped projects improve their research software. If you would like help with your software, let us know.

Abundance is a good thing not just for animals, but also for the researchers studying them. This study is, however, harder than it sounds, which is why it is an area of particular interest for Eric Rexstad, research fellow at the University of St. Andrews' Centre for Research into Ecological and Environmental Modelling

The exact term for this is Distance Sampling, where population numbers of a particular species in a certain area are estimated. For example, "how many harbour porpoises live in the North Sea?" as Eric puts it. Yet this leads onto more complex questions - in particular, how do animal populations react to perturbations or changes in the local environment, such as those caused by pollution or development?

Collaborations Workshop 2015 - an electric mix of people!

By Shoaib Sufi, Community Lead.

The Collaborations Workshop 2015 (CW15) took place last week in Oxford. It brought together an electric and buzzing mix of people with an interest in research software, and was the biggest Collaborations Workshop to date.

An inspiring keynote, a raft of lightning talks, wide ranging discussions, demos and intense hacking, allowed people to explore new ideas and gain advice from experts. With funders, researchers, developers, publishers and managers in attendance, the workshop represented views from every position in academia.

There are many outputs from the CW, which we have made available so that even people who could not attend the event can benefit from the discussions that took place. On the CW website, you can find summaries of the discussions, collaborative ideas, Hackday pitches, slides from the keynote and lightning talks, and the software written during the Hackday. Many of these resources are already available, and more will be available in the coming weeks.

Scientific Data Analysis with Java: DAWN

By Steve Crouch, Devasena Inupakutika, Alun Ashton, Mark Basham and Matthew Gerring

Scientific projects are often created as stand alone applications which use their own definitions for algorithms and visualisation tools. This makes it difficult to benefit from other people's work. The DAWN Science project allowed a large group of scientific developers and software engineers to  collaborate by developing a single, general purpose API to allow access and sharing of existing algorithms and visualisation tools. This significantly accelerates the development of new analysis tools. We reviewed the DAWN code and provided advice on how to improve the organisation of the software and sharing of the code. 

DAWN (Data Analysis WorkbeNch) is open-source scientific data analysis software for numerical data built on the Eclipse/RCP platform. It is developed by a collaboration of facilities and universities, some of whom are contributing code or development effort and others who use and test the software. The collaborative development is led by Diamond Light Source which is situated at the Rutherford Appleton Laboratory Campus near Oxford. Diamond is not restricted to a single scientific domain, so the software must cover a wide range of uses, from specialist capability like calibration and data reduction for diffraction equipment, to general capability like peak fitting and and integrated Python development environment including interactive tools such as plotting.

BioJS - free bioinformatics visualisation tools get a software facelift

By Devasena Inupakutika, Software Consultant.

With the advent of data-driven research in the life sciences, researchers have relied on data visualisations to generate hypotheses. Many bioinformatics services providers, such as EMBL-EBI or the NCBI, provide a browser-based environment to do this, as well as new ways to visualise biological data. It is important that the software is both high quality and user friendly, which helps researchers compare and contrast, as well as develop, well grounded conclusions. The Software Sustainability Institute worked with BioJS to review their code, help with coding standards - ultimately making it easier to develop with BioJS.

BioJS, a multi-partner effort coordinated by TGAC, provides services such as infrastructure, guidelines and tools, to represent biological data on the Web that can be reused by anyone. It is an open-source, community-based project, with a modular, structured design that is ideal for data-intensive research. It allows users to build reusable, interactive applications which can be easily deployed on the web.

Harnessing digital technology for health behaviour change

By Bob Patton, Lecturer in clinical psychology, University of Surrey.

UCL Centre for Behaviour Change (CBC) Conference 2015 was a two-day conference held at Senate House (London) to bring together experts from behavioural science, computer science, engineering and  human/computer interaction. The primary key note presentation was from Professor Bonnie Spring of Northwestern University, who discussed how an over reliance upon technology focused solution can be de-motivating, lead to reduced self efficacy and higher attrition rates from treatment programmes.

In the context of Precision Medicine – a term used to describe treatment applied to “the right patient, in the right place, at the right time” – we should be seeking to optimise our interventions, rather than to take a scatter gun approach and throw everything (including the kitchen sink) at trying to change behaviour. Perhaps the smartest thing that was said was the “ Widgets don’t in themselves change behaviour; its the underlying principals that count”. As an example Prof. Spring demonstrated a successful intervention using an old palm pilot (i.e. no graphics, limited functionality). The lesson here is to pay attention to the function  - there is a lot of robust theory relating to behaviour change, and we should try to use this in our attempts to digitise successful real-world applications.

Irreproducible research - some top tips

By Neil Chue Hong, Director.

Comic number 1869 from PhD Comics. (c) Jorge Cham. Used with permission.

The Software Sustainability Institute is proud to be associated with a major new paper on irreproducible research. The new paper is called "Top Tips to Make Your Research Irreproducible" by Neil Chue Hong, Tom Crick, Ian Gent and Lars Kotthoff, and is due to be published today (1 April) on arXiv. We present some excerpts of the paper with permission of the authors. Readers are encouraged to read the full version.

We have noticed (and contributed to) a number of manifestos, guides and top tips on how to make research reproducible; however, we have seen very little published on how to make research irreproducible.

It is an unfortunate convention of science that research should pretend to be reproducible; our top tips will help you salve the conscience of reviewers still bound by this fussy conventionality, enabling them to enthusiastically recommend acceptance of your irreproducible work.

By following our tips, you can ensure that if your work is wrong, nobody will be able to check it; if it is correct, you can make everyone else do disproportionately more work than you to build upon it. In either case you are the beneficiary.

  1. Think “Big Picture”. People are interested in the science, not the experimental setup, so don’t describe it.
  2. Stay high-level. Pseudo-code is a great way of communicating ideas quickly and clearly while giving readers no chance to understand the subtle implementation details that actually make it work.
  3. Short and sweet. Any limitations of your methods or proofs will be obvious to the careful reader, so there is no need to waste space on making them explicit.
  4. The deficit model. You’re the expert in the domain, only you can define what algorithms and data to run experiments with.
  5. Don’t share. Doing so only makes it easier for other people to scoop your research ideas, understand how your code actually works instead of why you say it does, or worst of all to understand that your code doesn’t work at all.

Our most important tip is deceptively but beautifully simple: to ensure irreproducibility of your work, make sure that you cannot reproduce it yourself. If you were able to reproduce it, there would always be the danger of somebody else being able to do exactly the same as you.

Scholarship in software, software as scholarship: a view from the humanities

By James Baker, Curator, Digital Research, British Library @j_w_baker

There are complex challenges in the humanities around software sustainability. For if it is true that humanists rely on software to do research, and increasingly software developed by their community, many if not most do not value the use of software and their nascent systems of credit for good software development and reuse are fragile. And so if the humanities are to make the best of the vast and growing digitised and born-digital corpora held by research libraries, key stakeholders in the field must ascribe the same value to the development of and experimentation with research software as they do to traditional practices such as literature surveys, source critique, and written publications.

In order to deepen my knowledge of these challenges and opportunities, I recently attended Scholarship in Software, Software as Scholarship: From Genesis to Peer Review - a two day meeting at Universität Bern that brought together an international audience of scholars, developers, funders, and associated individuals to consider the status, role, and assessment of software in humanities research. The discussions that interspersed the scheduled short papers, keynotes, and round table were varied, fluid, and expansive in character - indeed even the utility of wissenschaft as a term capable of overcoming the Anglophonic divide between 'the sciences' and 'the humanities' was addressed. Nevertheless, three themes were prominent: theory, community, and practice.