London, 14-16 February 2012
Report by Mike Jackson, The Software Sustainability Institute, EPCC, The University of Edinburgh
The Software Sustainability Institute was invited by the organisers of this year’s Dev8D to run a sustainability session, so I took the train to London to attend this popular developer-centric event, funded by JISC via DevCSI and held at the University of London Union.
Highlights and ideas for SSI
Mia Ridge of the Open University did a Guerrilla/Lightweight Usability Testing session (which clashed with my session so I missed it). Based on the notion that any usability testing is better than none it’s a very lightweight process – at a minimum it can be you, a user, your software and 10 minutes! It focuses on identifying things that need to be fixed rather than generating reports. The slides are online. We should link this from our Useful resources page and briefly blog it (along with a link to http://usability uk.org).
Add a link in our Useful resources page to online resources about "code smells" and an "Ask Steve" on "oi, my code smells…how do I give it the Lynx effect?"
Write a blog post on peer review of software and the ways to persuade developers' to adopt communal view of code developed on a project, rather than a defensive sense of personal ownership.
Certain sessions were "surgeries" e.g. linked data or HTML/CSS. Perhaps we could run a sustainability surgery or "Ask Steve in person!" at the next Dev8D, the Collaborations Workshop, Digital Research 2012 or other events. These could be themed (e.g. openness, maintainability, testing) or generic.
SSI T-shirts would be useful. We’d stand out in red with our disc logo on the back (or our "Death to decay" logo for a more street-hip look, ahem) and this would make us more easily identifiable and approachable.
My EPCC colleague, Radek Ostrowski suggested posing an "SSI challenge" at the next Dev8D.
Consider contacting Christopher Gutteridge and his colleagues about their Graphite and Grinder linked data tools about their sustainability. At both sessions Chris commented that he didn’t want to be the only person who knows how to use these tools. Also, identify opportunities to make the tools consistent in branding so they are visibly part of a suite of related linked data products.
Chris has published a good blog article on the pros and cons of JSON/REST (easier to understand and start using) or RDF (more challenging to learn but more powerful and expressive) for open data and recommendations for an organisational open data, including supporting both RDF/SPARQL and JSON/CSV/REST.
Enter the bear-pit – peer software review
On Wednesday I ran our SSI session on peer software reviewing. This included the latest in our occasional discussions on "what makes good code good" followed by one-on-ones where attendees paired up to review each others' code. After sitting in an empty room for 10 minutes, looking at the clock and getting more ever more twitchy, 10 attendees arrived to participate.
The attendees, who were software developers, with a couple of researcher-developers, put together a list of the qualities of good code, namely:
Documented, with usage examples so it's clear not just how to run it but how to use it in your own work.
Readable, using meaningful naming, using conventions and a consistent style.
Deodorised – avoids "code smells", that is, code that just feels wrong e.g. large classes or methods , methods with many parameters, duplicated statements in both if and else blocks of conditionals etc. I’d never heard this term before – it has its origins in Chapter 3 “Bad smells in code” by Kent Beck and Martin Fowler in Fowler, Martin (1999). Refactoring. Improving the Design of Existing Code. Addison-Wesley. ISBN 0-201-48567-2. For an overview, see code smells at Wikipedia.
Concise and modular, avoiding large classes or methods or methods with many parameters.
Tested and testable via a test suite.
Clear and well-designed so that when you look at it you can understand it and believe it does indeed work.
Consistent with the conventions and patterns of its language.
Reuses and recycles where appropriate and doesn’t reinvent the wheel.
Has copyright and a licence.
Well-commented with concise, accurate and up-to-date comments that explain why the code as it is and commented in the recommended style for that language e.g. JavaDoc or Doxygen.
Has no commented-out code, since it’s unclear whether such code is redundant, deprecated or should be uncommented in future.
Doesn’t silently or cryptically fail.
The attendees were then arranged into twos and threes and spent over an hour discussing each others' software in terms of readability, documentation, tests and general project openness, according to the interests of the individual attendees.
There was discussion at the end as to the potentially thorny issue of solo developers who may not want others looking at their code, and might be sensitive, defensive, or aggressive at the suggestion that their code should be reviewed (or changed, or fixed!) This relates to possible differences between perceived ownership (the individual developer) and its actual ownership (which may be a project or organisation). How is an environment fostered in which code is viewed as being under collective ownership with collective responsibility as to its quality, maintenance and improvement? How best can developers, project leaders or PIs encourage collective ownership, peer review and promote these as benefits in terms of encouraging developer growth and learning, improving code quality and maintainability and reducing risks (such as if a developer is hit by a bus!)?
From the comments at the end of the session it seemed that the attendees found the session a valuable experience and we hope to run this again at future events.
A big thank you from the SSI to the Dev8D organisers for allowing us to run this session!
As part of the peer review session I was given an introduction to Moodle by Matt Gibson of the University of London Computing Centre). Moodle is a free PHP-based course management system. Moodle demonstrates good open source maintainability practices and, as importantly, are aware of their gaps. They use a recognised open source licence, GPL, GitHub for their repository, http://moodle.org for their wiki, issue tracker, forums, e-mails. They have 287 developers with write access. Matt described intentions to provide more unit tests, an automated build-and-test infrastructure, shorter classes and methods, and a more disciplined use of their tracker (currently it’s a brain dump – suggested categories like bug, feature, nice-if etc could help).
Matt recommended two tools for managing agile projects: http://www.pivotaltracker.com, which supports virtual Post-It notes and customer prioritisation for example; and http://www.brightgreenprojects.com/
Debian and Ubuntu Software Packaging Workshop
Alexander Dutton provided an introduction to packaging software for Debian and Ubuntu. This can be done manually for the brave but it’s less traumatic to use tools. The session provided an overview of package structure, configuration files, how to build a package, Python-specific packaging (other language-specific options are also available), and APT repositories. Prerequisites are to install debhelpder, devscripts and dpkg-dev packages. Though a Linux box was provided, wi-fi problems meant I didn’t do the hands-on but have the slides and example files for later use.
Consuming Linked Data
Christopher Gutteridge, Ronan Klyne, and Patrick Mcsweeney gave a hands-on introduction to the Graphite PHP Library. Inspired by jQuery, this can render small RDF (.rdf or .n3) documents in HTML. There is also a Python API. It's currently under development at The University of Southampton and powers much of their Open Data Service. There is also an introduction to linked data. Graphite is very straightforward to use (jury’s out on install, since I used their Live CD to run Linux on my PC and this had Apache and PHP set up and all the paths sorted) but may be a potentially useful tool for knocking up linked data demos quickly.
Overnight, Chris knocked up a "dumper", an online form that runs Graphite.
Creating Linked Data from Spreadsheets
Following on from Graphite, Christopher Gutteridge gave a hands-on introduction to Grinder. This takes CSV, TSV, Excel (provided a worksheet is identified), or Google spread sheets and "grinds" these up into triples via XML and XSLT conversion. It supports various processing operations (e.g. to hash e-mail addresses) and XSLT post-processing of the RDF triples. It’s also been used to provide KML and other configuration files. Southampton use it for all their data sets and the Dev8D web site programme was constructed using it. It has a Perl-specific component but that could be rewritten in Java in a day. Other useful links in this talk were prefix.cc, a site for namespace lookups, OpenOrg patterns for organisational data, The University of Southampton’s Grinder-produced data, with source, configuration and XSLT and Raptor’s RDF parser, rapper.
Introduction to Google Apps Script
eBooks: Making Content for Kindle and EPUB
Anthony Levings gave an introduction and hands-on example of eBook production, validation and browsing tools and technologies. I lost the thread of the talk as it continued during the software installation at the outset. However, there was a template e-book on a USB stick passed around and so from both the talk and with Google’s help I had a hands-on introduction to various eBook areas...
Issues of DRM e.g. Apple versus Adobe, applied or not etc. Can affect users, their choice of readers and their upgrades. Some publishers may insist that no DRM be applied due to this, so readers don’t upgrade and find themselves unable to read their books.
Reflowable – designed for screens where display size is not known e.g. plain text, HTML. User may select display size, font size, line spacing etc for optimal readability.
Fixed-layout – designed for printing e.g. PDF, PS.
IDPF (International Digital Publishing Forum) digital publishing trade and standards produce the free open EPUB standard. This defines reflowable content for device-specific display optimisation. It specifies a ZIPped directory format with eBook packaging, formatting and content, using XHTML and CSS. Version 2.1 defines OPS (Open Publication Structure – formatting of eBook content), OPF (Open Packaging Format – XML .epub file), OCF (Open Container Format – ZIP archive). The example directory structure in the session was:
mimetype – application/epub+zip
META-INF/ - just like a JAR!
container.xml – describes media-type (application/oebps-package+xml) and relative path to OPF file.
content.opf – XML description of book e.g. title, files etc.
toc.ncx – XML table of contents.
Images/ - *.jpg
Styles/ - *.css
Text/ - *.xhtml
This is then ZIPed up e.g. to dev8d.zip and renamed to dev8d.epub.
How to convert .epub (or X/HTML, XML) files to .mobi files using Amazon’s free Kindlegen command-line tool. .mobi files are compatible with the Kindle and other e-readers, and Amazon’s .azw Kindle format is essentially identical. The command is simply:
$ kindlegen.exe dev8d.epub
Viewing a .mobi file using Amazon’s free KindlePreviewer, a Kindle emulator, to check presentation issues.
Viewing a .epub file using Adobe Digital Editions, a free Flash-based tool. Just select Library => Add Item to Library. Select dev8d.epub. Click Open. Double-click on the book.
Epubcheck, a free tool, runnable at the command-line, as a web app or as a library, for validating .epub directories (though I didn’t have access to Java on my laptop so couldn’t get it to run).
Sample books are available at http://s3.amazonaws.com/kindlegen/samples.zip.
Indexing and Elasticsearch
Given by Mark Macgillivray and Richard Jones. Powerful data structures require a greater understanding of these structures and complex queries to be formulated. Sacrificing data structure complexity and connectedness can yield simpler and faster searches. Elasticsearch is an alternative to the established SOLR engine and has been available since 2010. It’s an Apache 2-licenced RESTful Lucene-based search engine, based on JSON input and output. They’ve an IRC, blog, code hosted on GitHub, and GoogleGroup e-mail. They’re used by Mozilla. To run it requires Java and Python, Python’s JSON library and the cURL tool – together these make it very easy to play with (once you’ve got all the dependencies which required some catch-up on my part to yum install an up-to-date Python version). A simple search on 19,000,000 records is very quick on laptop but faceting (having filtered searches) is more challenging and ideally a number of instances are run with sharding (splitting data and duplicates across servers) would be used to allow queries to be parallelised.
As an aside, the session used http://pad.cottagelabs.com/es as a collaborative scratch pad. The presenters use Elasticsearch in their BibSoup, an online bibliography resource.
Software Developer Sustainability
Presented by the catering team, the lunchtime cakes on all three days (baked cheesecake, lemon tart and chocolate tart) delivered an excellent example of how to sustain software developers throughout their afternoons.
Future DevCSI events, to be confirmed, are planned for Birmingham in May (DevEd – for educational research), Liverpool in October/November (DevXS – for students) and the next Dev8D in May 2013.