HomeNews and blogs hub

Speedrunning a Workshop on Reproducibility for Machine Learning in Science

Bookmark this page Bookmarked

Speedrunning a Workshop on Reproducibility for Machine Learning in Science

Author(s)
Jesper Dramsch

Jesper Dramsch

SSI fellow

Posted on 10 March 2023

Estimated read time: 8 min
Sections in this article
Share on blog/article:
Twitter LinkedIn

Speedrunning a Workshop on Reproducibility for Machine Learning in Science

Posted by d.barclay on 10 March 2023 - 10:00am

Laptop and mug on a desk

By SSI Fellow Jesper Dramsch.

My brain works in a very special way.

Sometimes I get an email reminding me that a deadline for something is mere days away, and I will suddenly get the idea that it would be awesome to do that thing. It is how I became an SSI fellow, after all. The email in question was about the Call for Participation at Pydata Global. What specifically caught my eye was the workshop section, which I had also included in my original SSI proposal for this fellowship.

A spontaneous workshop?

How would that even be possible?!

Through the power of Fellowship!

I am lucky to have access to this whole SSI community with many fellows that are also interested in improving machine learning in research.

However, all of these people tend to do awesome things and basically drown in opportunities to speak, travel, organise, and participate in events that are much more prestigious than my small idea.

I pitched my idea to three people.

I was honest that this was way too short notice to be reasonable, but I'd be more than thankful if they had time to pitch in at any level.

Two accepted!

Fellow fellows Gemma Turon and Valerio Maggio signed up for this madness (and I will forever be thankful).

Writing a proposal in no Time

So it was time to actually get that idea on virtual paper.

It was time to create something we wanted and that Pydata would accept as a workshop.

Luckily I already had a lot of material from the tutorial I held in the summer, from which we could pick and choose in case we wanted some structure.

We came up with a structure that included some invited talks and interactive sessions.

5 min

Opening

Jesper Dramsch

10 min

Opening Invited Talk - Case Study

TBD

10 min + 10

Overview Talk + Interactive Session Jesper

Jesper Dramsch

10 min + 10

Machine Learning Evaluation + Interactive Session

Valerio Maggio

10 min

Break

 

10 min + 10

Integrating ML + Interactive Session

Gemma Turon

10 min + 10

Invited Talk + Interactive Session

TBD

15 min

Discussion

All Speakers

5 min

Closing

Jesper Dramsch

Then I added an abstract and a brief description while Valerio and Gemma filled in details on their sessions.

This proposal felt like something we could submit!

Then we just had to wrangle with the Pretalx platform, which consisted of me writing the proposal and then getting a link to add other organisers.

We had this done just in time.

Only to realise that Pydata had pushed the deadline...

I had spent the days on my vacation organising, apologising, writing, writing, writing, and frankly, I didn’t want to spend any more time on work-related things. I was in Mozambique to enjoy the food and sun, and go out every morning to dive with mantas, whales, and sharks. So we agreed to submit the proposal in the state it was in. I also submitted proposals for a talk and to hold my tutorial from Euroscipy again because it would be beneficial to a wider audience. And then it was time to exercise those work boundaries everyone was talking about and enjoy the sun!

Then we simply had to wait.

Metrics and Evaluation and Explanation and Pipelines and Testing and Reproduction

Inviting Awesome Speakers

In November, I finally got three emails.

Rejection. My talk and tutorial were not accepted by Pydata, which was a bummer.

I was nervous about opening the last email.

The workshop. Verdict: Accepted!

That meant we only had a month to invite people to the workshop. A month, if I hadn’t fallen ill early in November and wasn’t pretty burnt out after two months of travelling for leisure and work.

So it was more like 2-ish weeks to invite people. Also a very short time.

Here are some tips for inviting people to a workshop:

  1. Plan some time for them to come up with and create a talk
  2. Don’t do an ML-focused workshop during NeurIPS
  3. Don’t get your Twitter banned when you need to reach out to people

Twitter would have been so easy. I was connected to some folks there who would have been great speakers. My email, instead, would get lost in a sea of other emails. But it was the only way. Twitter support had just been fired, so I had no hopes of getting my accidentally locked account back.

Gemma reached out to Mike Walmsley, a fellow here at the SSI, and I reached out to Goku Mohandas, creator of one of my favourite MLOps resources Made with ML and they both graciously agreed!

So we worked the schedule a bit to take on the brunt of the presentations. I would have felt uncomfortable asking for 20+ minutes within two weeks from the invited speakers and came up with this:

5 min

Opening

Jesper Dramsch

20 min

Why and how make ML reproducible?

Jesper Dramsch

25 min

Evaluating Machine Learning Models

Valerio Maggio

10 min

Invited Talk

Mike Walmsley

10 min

Break & Chat

 

10 min

Testing in Machine Learning

Goku Mohandas

25 min

Integrating ML in experimental pipelines

Gemma Turon

10 min

Discussion & Audience Questions

All Speakers

5 min

Closing

Jesper Dramsch

A nice journey through different aspects of making ML reproducible in science.

So it was time to build a website (realworld-ml.xyz) and ask the SSI for funding.

A note on Doubt

Let me be very open here.

There were several points where I was ready to cancel, and Gemma and Valerio assured me this would be good in the end.

I was worried it wouldn’t get accepted. That wasn’t the case.

I was worried we wouldn’t find other speakers. That wasn’t the case.

I was worried we’d get in trouble for changing the structure. That wasn’t the case.

I was worried no one would show up. That wasn’t the case.

I was worried the speakers’ planes wouldn’t land in time. That, luckily, wasn’t the case.

I was worried time zones would mess it all up. That wasn’t the case.

I was worried I’d be a terrible host. It was ok, but I need practice.

There were many moments in which doubt crept in, but eventually, it was time to log on to zoom and do the thing!

The Workshop

Jesper Dramsch

On December 2nd, I logged on to Zoom. We met Kevin, the Pydata moderator.

And when Kevin said: “Well, it looks like you have a good structure going on, so I’ll just take care of the recording in the background”, that’s when I knew we would be okay!

Then people started pouring in.

We had 70 people show up for our Zoom session!

I tried my hand at emcee-ing, and the next time I know, I realised I shouldn’t give a talk and emcee in the same workshop. That’s a rookie mistake. I also now know I should get better at keeping people on time. I was too socially awkward and wrapped up in these awesome presentations to interrupt when I should have. But once again, I was saved by Gemma, who graciously cut her presentation short to accommodate a final discussion.

Nevertheless, everything else went well. The otter.ai transcription was written along, the talks were insightful, and there were interesting questions from the audience. And until the very end, we had around 60 people attend for the full 2 hours!

You can watch the recordings here:

What I learned from this Workshop

Let’s start with the obvious. I learned something great from every talk.

But from the organisational side, I learned that working with others can work wonders in getting things done and staying on track.

Having some extra funding to pay for accessibility tools like live transcription is a great way to upgrade your workshop!

I didn’t have funds to quite pay the speakers. But at least I could send them a nice gift. There’s a cost-of-living crisis and the holidays were coming up, after all. This surprise was well received, and I highly recommend it!

Organising this could have gone wrong at a few breaking points, but it worked out, despite the short timelines, and I am very thankful for these high-quality contributions presented during the workshop.
 

Share on blog/article:
Twitter LinkedIn