Software and research: the Institute's Blog

BioC 2014 - the right way to conduct Bioconductor

Last month saw the BioC 2014 conference take place at the Dana-Farber Cancer Institute, Boston, MA. Starting with a Developer Day on July 30th, it continued with a series of talks and workshops until August 1st.

Bioconductor is an R-based open-source, open-development software project. It  provides tools for the analysis and comprehension of high-throughput genomics data. First developed in 2001 by Robert Gentleman, who also co-founded R with Ross Ihaka, it is overseen by a core team based at the Fred Hutchinson Cancer Research Center, alongside several other American and international institutions.

The 20-line script that saves you hours of mind-numbing tedium

By Simon Hettrick, Deputy Director.

Apart from a brief liaison during my undergraduate years, I am cursed with a complete lack of training in programming. When I face problems that are easily solved with some basic coding, I experience the beginner’s dilemma (not unlike the problem with automation): do I choose the frustration entailed in working out how to write a short program to do the work automatically, or the monotony of performing the same simple task a thousand times by hand?

Once you’ve taken a first step into coding, and you see how quickly and efficiently it can change your work, it’s difficult to stop. My epiphany occurred when someone renamed a hundred images for me using a single command line instruction. I was going to make those changes by hand, so this little trick saved me something like an hour of tedious work. In addition to these tricks, I find short programs provide the most compelling reason for researchers to learn a bit about coding. The heavyweight software packages are, of course, very important to research, but the 20-line script that saves you hours of mind-numbing tedium is the real hero of research.

Desert Island Hard Disks: Greg Wilson

You find yourself stranded on a beautiful desert island. Fortunately, the island is equipped with the basics needed to sustain life: food, water, solar power, a computer and a network connection. Consummate professional that you are, you have brought the three software packages you need to continue your life and research. What software would you choose and - go on - what luxury item would you take to make life easier?

Today we hear from Greg Wilson, founder of Software Carpentry.
Like most people who have moved from programming to managing (and in my case, teaching), I feel nostalgic for the good old days when all I had to do was fix memory leaks in multi-threaded C++. So when I imagine being stranded on a desert island, my first thought is, I could use the time to learn how to program again! The reality is that programming has moved on in the decade since I last shipped a product, and I'd enjoy reacquainting myself with the craft I used to love.

The first software package I'd bring with me would therefore be the Glasgow Haskell Compiler (GHC). There are many newer functional programming languages, like Clojure, Scala and F#, but Haskell seems to have inherited the title of a tool for thinking about programming that for many years belonged to Scheme. Becoming a native speaker of Haskell would, I hope, force me to see all of programming with fresh eyes, and re-instill the sense of wonder I first felt when writing a small Pascal interpreter in Pascal more than thirty years ago.

Women! Science is not for you!

By Pam Cameron, Managing Director of Novoscience, and Clare Taylor, Lecturer in medical microbiology, Edinburgh Napier University.

This article is part of our series Women in Software, in which we hear perspectives on a range of issues related to women who study and work with computers and software.

The title of this blog might seem to be preposterous given that numbers of female undergraduates in many STEM (science, technology, engineering, and maths) subjects are on the rise.  However, humour us for a few moments and read on.

We seem to be talking a lot about women these days. For example, women in technology, women in engineering, women in boardrooms, women in sport, women in computing, and so on. It’s good to talk about women, but in all honesty, we’ve been doing this for the last 50 years, and guess what? We’re still having the same conversations. – getting the facts straight during humanitarian disasters

By Victor Naroditskiy, Post-Doctoral researcher on the ORCHID project, University of Southampton.

This article is part of our series: a day in the software life, in which we ask researchers from all disciplines to discuss the tools that make their research possible.

We live in what has been described as the Information Age, but at times a better term would be the Disinformation Age. This is due to the sheer volume of information being propagated by the Internet, and in particular, social media, every day at the click of a mouse. Finding out the truth in this sea of contradictory material has, as a result, become increasingly difficult.

Can a coordinated collective effort be effective in quickly discerning true answers to key questions? Crowdsourcing has been used to solve several seemingly impossible search tasks, but there have been many more failures.

Optimising OpenMP implementation of MD modelling package Tinker

By Weronika Filinger, Application Developer at EPCC.

Do you use scientific codes in your research? In this article I will describe briefly the process I have undertaken to optimise the parallel performance of a computational chemistry package – TINKER, as part of the EPCC/SSI APES project.

TINKER can be used to perform molecular modelling and molecular dynamics simulations. Originally written in Fortran 77 and currently in the process of being ported to Fortran 90, it has already been parallelised for a shared memory environment using OpenMP. The code does not scale well with increasing number of cores and the scaling is even poorer on AMD architectures. In my investigation I used a cluster machine hosted by EPCC consisting of 24 compute nodes, each with four 16-core AMD Opteron 6276 2.3 GHz Interlagos processors, giving a total of 64 cores per node that share memory. As the current parallelisation of TINKER is purely for shared memory, all my investigations were restricted to a single node.

Making the dead (Trigonotarbid) walk

A Trigonotarbid, yesterday.By Russell Garwood, 1851 Royal Commission Research Fellow at the School of Earth, Atmospheric and Environmental Science, University of Manchester.

This article is part of our series: a day in the software life, in which we ask researchers from all disciplines to discuss the tools that make their research possible.

Palaeontology is often thought of as an antiquated field full of elderly researchers, but the discipline as it is today rarely matches this stereotype. Modern studies are multidisciplinary and use a diverse array of techniques to investigate the history of life. In some cases, palaeontologists can even make the dead walk.

When we think of the first animals to live on land we tend to imagine ungainly, fish-like creatures lolloping onto a beach somewhere  around 385 million years ago.

Hackathons with a difference: writing collaborative papers

By Derek Groen, Fellow and Research Assistant, Centre for Computational Science, University College London

This September I will host the first Paper Hackathon event in Flore, Northamptonshire, with help from Joanna Lewis and support from both the Software Sustainability Institute and 2020 Science.

To my knowledge, this is the first time anyone has organised a Paper Hackathon, although amusingly there has been a Hackathon focused on the IPhone Papers app. With that in mind, let me share how I thought of this, and why I think it is a good idea.

The idea

2013 was quite an eventful year for me. I became a Fellow of the Software Sustainability Institute, but I also became an Associate Fellow of the 2020 Science project. As such, I familiarised myself with two groups and their differing backgrounds, but with an unexpectedly good match in terms of aims and objectives.

Bioconductor conference – R-based and open-sourced

By Laurent Gatto, Software Sustainability Institute Fellow.

This past week saw the yearly Bioconductor conference  take place at the Dana-Farber Cancer Institute, Boston, MA. It started with a Developer Day on July 30th and continued with scientific talks and workshops until August 1st.

Bioconductor is an R-based open-source, open-development software project that provides tools for the analysis and comprehension of high-throughput genomics data. It was set up in 2001 by Robert Gentleman, co-founder, alongside Ross Ihaka, of R and is overseen by a core team based primarily at the Fred Hutchinson Cancer Research Center in Seattle, WA and by other members coming from a range of other US-based and international institutions.

A sprint to new materials for Software Carpentry

By Aleksandra Pawlik, Training Lead.

Nine people, two days, two venues, two proper Polish lunches, many pull requests and one tour in the super computer centre. This summarises the Software and Data Carpentry sprint in Krakow during which we created new materials for Software Carpentry and updated existing materials. The team in Poland was one of nineteen teams taking part in the Mozilla Science Lab global sprint on 22-23 July 2014.

The idea of the sprint was based on the Random Hack of Kindness approach in which the teams all around the world work on hacks 24/7 (due to the different time zones in which the work is completed). The teams in the Software Carpentry sprint were located in Australia, New Zealand, Europe, US and Canada. When the teams in Australia and New Zealand were finishing their day, they handed over to the European groups. Around lunchtime in Europe, the (early birds!) teams in the North America were starting to join in. All sites had webcams streaming live picture. We could see and talk to each other, and it was really motivating to see people around the world hacking away or writing new materials for lesson.