Writing readable source code
By Mike Jackson.
Readable source code is vital
If developers are to quickly and easily understand your source code, it must be readable. The Software Sustainability Institute can provide advice and guidance on producing readable source code. We can even review your source code to see whether it can be improved.
Why write this guide?
With the rise of open-source software, and the associated need to create and extend readable source code, we thought this guide would be a useful resource for open-source developers.
Why is readable source code important?
Source code is designed for humans. It may end up being processed by a machine, but it evolves under the hands of human developers who need to understand what the code does and where changes need to be made. Writing readable code costs only a fraction more than writing unreadable code, but the payback is immense.
Everyone has experience of slowly wading through badly written and unreadable code. Apart from being a frustrating task, it's also a waste of time that could be more profitably invested in improving the code. Readable source code allows developers to spend more time writing code and less time trying to understand it, and that saves money.
There's also your image to consider. If your code is badly laid out, messy and cryptic, developers will assume that it is also buggy and sloppily written. This leads to a mistrust of your software, which can reduce its uptake.
Formatting
The formatting or appearance of code determines how quickly and easily a developer can understand what it does. A compiler will see no difference between the examples below, but the second example will be more easily understood by a developer:
// Example 1: unformatted code.
public class Functions
{
public static int fibonacci(int n)
{
if (n < 2)
{
return 1;
}
return fibonacci(n-2) + fibonacci(n-1);
}
public static void main(String[] arguments)
{
for(int i=0;i<10;i++)
{
print(“Input value:”+i+” Output value:”+power(fibonacci(i), 2)+1);
}
}
}
//Example 2: formatted code.
public class Functions
{
public static int fibonacci(int n)
{
if (n < 2)
{
return 1;
}
return fibonacci(n-2) + fibonacci(n-1);
}
public static void main(String[] arguments)
{
for (int i = 0; i < 10; i++)
{
print(“Input value:” + i +
” Output value:” +
power(fibonacci(i), 2) + 1);
}
}
}
Indentation makes a clear connection between blocks of code and the classes, functions or loops to which they belong. If a statement is longer than a single line on screen, indentation helps the developer understand where the statement begins and ends. Whitespace makes the code appear less cluttered and allows the grouping together of logically-related elements (like constants or local variable declarations).
In many languages indentation is purely cosmetic (e.g. C/C++ or Java) and the number of spaces used to indent code is left to the developer to decide. However, in certain languages (e.g. Python or Occam) indentation is more restrictive because it has semantic significance: it defines a loop body or a function body.
Many programming environments (e.g. Eclipse, JBuilder, NetBeans and Microsoft Visual Studio) provide support for code formatting, and certain text editors (e.g. Emacs) can be extended with support for language-specific indentation. These tools may need careful configuration to ensure that the appearance of the code on screen is the same as the appearance of the code in the text editor.
Good formatting can impact upon design. A function with seven arguments might not be very readable on-screen. To make it more readable, you could create a new data structure or class to hold some of the arguments. You could also break up a function that cannot be viewed on one screen into a number of smaller functions that can.
Naming
The careful selection of names is very important to developer understanding. Cryptic names of components, modules, classes, functions, arguments, exceptions and variables can lead to confusion about the role that these components play. Good naming is fundamental to good design, because source code represents the most detailed version of your design. Compare and contrast the ease with which the following statements can be understood:
out(p(f(v), 2) + 1); print(power(fibonacci(argument), 2) + 1);
There are common naming recommendations. Modules, components and classes are typically nouns (e.g. Molecule, BlackHole, DNASequence). Functions and methods are typically verbs (e.g. spliceGeneSequence, calculateOrbit). Boolean functions and methods are typically expressed as questions about properties (e.g. isStable, running, containsAtom).
Naming also relates to the use of capitalisation and delimiters, which can help a developer to quickly determine if something is a function, variable or class. Common guidelines for C and Java include:
- Constants should be capitalised: PI, MAXIMUM_VALUE.
- Class names should start with an initial capital with the first letter of subsequent words capitalised (this is called Camel Case): Molecule, BlackHole, DNASequence.
- Functions should start with a lower-case letter with the first letter of subsequent words capitalised: spliceGeneSequence, calculateOrbit.
Comments
Source code tells a developer what the code does. Comments allow you to provide the developer with additional information. A developer should be able to understand a single function or method from its code and its comments, and should not have to look elsewhere in the code for clarification. It's easy to get lost in your own code, but remember that the developer who reads your comments will not have the same knowledge of the project or code as you do.
The kind of things that need to be commented are:
- Why certain design or implementation decisions were adopted, especially in cases where the decision may seem counter-intuitive.
- The names of any algorithms or design patterns that have been implemented.
- The expected format of input files or database schemas.
There are some restrictions. Comments that simply restate what the code does are redundant. Comments must be accurate, because an incorrect comment causes more confusion than no comment at all.
Coding conventions
As each language has its own syntax, semantics and sets of built-in commands, what constitutes readable code differs across programming language. What is readable is also affected by the opinions and preferences of the individual reader. Nevertheless, a number of language-specific coding conventions have evolved, reflecting both general and language-specific good practice.
It’s recommended that projects adopt a set of coding conventions. Not only does this promote readable code, it helps ensure that the code looks consistent, even if it the software consists of hundreds of source code files and is worked on by many developers. Projects as varied as Mozilla, Linux, Apache, GNU, and Eclipse all have their own project-specific conventions that their developers are expected to conform to.
Project-specific conventions can also embody requirements specific to your project. They promote consistency of naming across packages, components, classes, or functions: 'All test classes must have the suffix Test, e.g. FourierUtilitiesTest'. They ensure that others know who owns the copyright on your open-source code: 'All source-code files must have a comment with the statement Copyright © My Organisation, 2010'. They ensure that others know about restrictions on your open-source code: 'All source code files should have a comment with the text "Licensed under the Apache License, Version 2.0".'
Many programming environments allow templates to be defined, which help developers conform to coding conventions for source-code files. Templates are just tools: they cannot guarantee readable code in themselves.
Code-analysis tools allow your coding conventions to be defined as rules. Your source code can then be analysed against these rules to automatically check for conformance. These tools can publish reports that highlight what rules are violated and where in the code the violations occur. Popular code-analysis tools are CheckStyle for Java, StyleCop for C# and Perl::Critic for Perl.
Source code and documentation
Certain languages have tools available that can automatically generate documentation from source code. This documentation can help developers navigate their way around your code and understand what each component does. One example is JavaDoc for Java. This takes Java source code and outputs a set of HTML pages with information about classes, methods, arguments, return types and exceptions. The page content is derived from the source code itself and all pages are automatically cross-referenced. Tags can be provided in comments and the use of a tag determines how that part of a comment is presented in the HTML. As a simple example, a comment "{@link http://www.google.com}" will become a hyperlink in the HTML. Other examples of document-generation tools include doxygen for C/C++, Fortran, C#, Java, Python and PHP; NDoc for C#, and f90tohtml for Fortran.
Be consistent
There is no single correct way to indent, use whitespace or name things, nor is there a single correct set of coding conventions to use. Writing readable code is very much dependant on your programming language, the requirements of your project and what you think of as readable. The golden rule is to be consistent. Once you’ve decided upon the conventions you will use – with the agreement with the other members of your team – then document the conventions and stick to them. At the same time, your conventions are not set in stone. If they need to be changed or improved, then do so. Your conventions are there to help and serve you, and they will ultimately lead to readable code.
Further reading
There are many resources on the subject of readable source code.
- Programming style from Wikipedia. Includes links to commonly-used coding conventions for a number of languages.
- Indent style from Wikipedia.
- Naming convention from Wikipedia.
- Coding conventions from Wikipedia.
- Coding Style under 'Programming Tutorials - C, C++, OpenGL, STL', Cprogramming.com. A good overview of the importance of whitespace, naming and comments with examples from C/C++.
- How not to write Fortan in any language, by D. Seeley. An overview of good coding practices that are common to all languages.
Last updated: Tuesday 30 August 2011.
