Making machine learning more reproducible with scientists at EuroSciPy 2022

Posted by s.aragon on 7 December 2022 - 10:00am View of Rhine river

Photo courtesy of Jesper Dramsch

By Jesper Dramsch, SSI Fellow.

EuroSciPy is one of my favourite conferences. I have been going for a few years, and it’s a small community of European scientists that use or develop code in Python. Throughout the years, I made friends and memories in many different countries.

But the pandemic put a harsh pause on the event.

Back in person

In fact, it would be my first in-person event after we all had to stay away from each other to keep each other safe.

I knew not everyone would be comfortable showing up while the pandemic was still going on. But I was hoping to see at least a few familiar faces. It just so happened to also be the perfect venue for a resource I wanted to create during my SSI Fellowship: a tutorial of quick wins for researchers to make their machine learning applications valid.

So I boarded the train to Basel in Switzerland with my half-finished tutorial in hand. (It’s a lovely train route along the Rhine river – I highly recommend it.)

Disaster on Monday morning

I had a slot Monday morning, a time that isn’t necessarily my favourite time, but I was determined to make it work. Marked down in my calendar, I knew I had until 10:00 to arrive at the venue, register, and of course, finish my tutorial. So in my usual manner of biting off more than I can chew, I was still finishing up the code through the night.

Around 8:00, I made my way to the venue, so I’d have enough time to arrive and set everything up before. I arrived and was greeted with, “We’ve been looking for you everywhere. Your tutorial should’ve started 30 minutes ago.” That sinking feeling in your stomach? That’s what I felt. But worse.

Turns out the organisers rescheduled my tutorial earlier and didn’t send out a notification, and I should’ve checked the schedule again closer to the event. They were stressed. I was stressed. It is devastating, but it happens.

Making the best of it

I gave a speedrun version of my tutorial. Additionally, the organiser had a talk slot on Thursday, where I could give a less interactive version of the tutorial and simply point to the resources. We made it work. People still loved the direct and no-nonsense approach of the tutorial and Jupyter notebooks I created.

The talk on Thursday went incredibly well, too, with a fantastic interested audience. Probably helps that I named it: “Increase citations, ease review & collaboration – Making machine learning in research reproducible”

My machine learning reproducibility tutorial and talk

Luckily, I had given a few talks before, so I was ok improvising the length of the tutorial and cutting it down to a 30-minute talk. I built this tutorial resource because I felt like much of reproducibility was too theoretical and abstract. Researchers are strung for time with too many things to do, so adding another “but you have to” that is unenforceable seems moot. So I created those notebooks to make it easy to copy a code snippet and re-use it and make sure that these insights are valid.

That turned out to be the right idea. After both my tutorial and talk slot, I had some very interesting conversations. A few even showed interest in my mini e-book “Making machine learning work in the real world”, which I give away with my newsletter. Some went so far as to assume that I must be a computer scientist since I was promoting good software practices in research. But I’m just an applied physicist with some opinions. In the end, people in the audience appreciated the direct and applicable information that comes from experience.

Photo of EuroSciPy22

Making even more of it

The event itself was great. I saw some old friends and made new connections. I saw Gaël Varoquaux, core developer of scikit-learn and director of INRIA, talk about machine learning evaluation and missing values, which is always lovely. In fact, it was an expansion of a small talk he held at my first Euroscipy in 2018, which informed a lot of my decisions of what to focus on in my own machine learning research and education.

I got to celebrate the new jobs of some friends, like both Cheuk Ting Ho and fellow Fellow Valerio Maggio getting ope -source developer advocacy roles at Anaconda Inc. And then there was exploring Basel together, and food, and chats, and drinks, and everything I missed about in-person conferences.

I've missed this

I always saw in-person conferences as a way to make connections. The talks are nice. But truly bonding over the weird lasagna and having a pint during golden hour at the riverside is what makes these events special. The huge amount of information from tutorials, talks, and side chats is a nice baseline, though. Months later, I sometimes see people talk about these topics, a reproducibility of real-world machine learning, and I can link to the resource I created. A lovely bonus!

Maybe you’d be interested too

If you're interested in my tutorial, then check it out on Github.