PyData London 2014

Posted by a.hay on 31 March 2014 - 2:00pm

By Mark Basham, Senior Software Scientist, Diamond Light Source and 2014 Institute fellow.

As a scientist, the chance to glimpse inside the world of data analytics in the financial sector was something I was really keen on, and if nothing else, the setting for PyData London did not disappoint. Level 39 is the 39th floor of 1 Canada Square, at the heart of Canary Wharf. Its breath-taking views and modern layout and design made for a really good conference location, and set the mood for the conference well.

PyData is all about using Python to analyse data, and as such the delegates were a mix of academic and commercial programmers, which made for an interesting diversity of presentations and conversation. In addition to this, there was a two track program, the first generally targeted at novice Python users, and the other with more advanced talks.

With presentations ranging from You give me data, I give you art and Adaptive Filtering of Tweets with Machine Learning through to Blosc: Sending data from memory to CPU (and back) faster than memcpy() and DX Analytics - A Python-based Library for Derivatives Analytics, there was something for everyone. In addition to the full program, all the talk videos will be online soon.

I think that most people will have come away with vastly different things from the meeting, but for me there were a few key points. The first is that the data revolution is well under way, and the ability to capture data is increasing with incredible speed, meaning that even harvesting open data (such as twitter feeds) leads to massive datasets, let alone if you try to capture your own trading information or scientific data. Processing these vast datasets quickly is a must, but sometimes being able to adapt to the next massive dataset is as, if not more, important. It was noted that although Python excels at the second point, it does not always address the first point so efficiently, although there were many talks at the event which showed ways to address this.

The second point was that many people had moved away from using Python to display the results of the data analysis which they perform, choosing instead to serve the data up to a JavaScript webpage to do the visualisation. This has several benefits, the key one being that you can easily share the results of your work, while the “arms race” (as one speaker put it) between the different browsers for ever better JavaScript performance makes the display of data comparable to rich clients in many ways. The ease of running Python webservers and core support for ‘json’ files simply reinforces this argument.

All in all, I found the PyData conference to be a very good place to see what advances had been made with libraries and implementations, to talk about good practices and methodologies, and to be also inspired by fields other than your own, which led to some interesting developments for me. Finally, as a new user, I feel that it would be a very helpful conference to experience the full width of libraries that already exist in the batteries included world of Python.