WhatsApp Scraping

Posted by g.law on 21 October 2019 - 7:15am
Mobile phone messaging screen
Photo by Christian Wiediger on Unsplash

When researchers Drs Gary Motteram, Susan Dawson and Amanda Banks Gatenby from the University of Manchester's School of Environment, Education and Development wanted to run analysis on WhatsApp messages, they came to Research IT for help.

They were interested in performing qualitative data analysis of social media data (specifically WhatsApp) from their communications with school leaders they are working with in Côte d’Ivoire.  The researchers wanted to see if patterns emerged in the WhatsApp messages or if interesting features came from the data.

Joshua Woodcock, Research Software Engineer (RSE), wrote a script that picks apart the message text and creates a streamlined file that keeps data points such as sender, timestamp and the message itself.  Unnecessary information such as when somebody leaves or joins the chat or changes icons etc is discarded.

During the project WhatsApp changed how their messages were exported meaning that that the original scripts would only work on newly exported messages, not older ones. Thankfully, Joshua was able to implement a fix, meaning that the script will work on any exported WhatsApp message (for now!).

Even though the researchers have only just got access to the script it has already demonstrated that it can analyse over 2000 messages in a matter of seconds, paving the way for the analysis of many more messages.

This small project was an ideal demonstration of how technology changes, even over the course of a short term project, and how important it is to make sure that your code is robust so that it can be used in the future – and easily adapted, not if, but when technology changes.

Joshua’s code has been made open source so anyone can use it.  The code along with instructions can be found in the Research IT Git Hub repository.

This post was first published on the University of Manchester Research IT blog


Want to discuss this post with us? Send us an email or contact us on Twitter @SoftwareSaved.