Choosing a repository for your software project
By Neil Chue Hong.
Once it has left the confines of your own machine, there are four things that are needed for the successful development of your software: a website, a mailing list, an issue tracker and a code repository.
Although most of the infrastructure needed by your project can be set up on your own systems, there are many tools and services that can help you to develop, maintain and publish your software. This guide provides an overview of the different options for repositories, and looks at some of the decisions you will need to make before choosing a repository. Other SSI guides take a more detailed look at specific repositories.
We've also written a blog post about one of our staff member's experiences of choosing a code repository. It provides further information about which repository you should choose.
Why write this guide?
We received a lot of questions about repositories following the news of the impending closure of NeSCForge (a repository run by the National e-Science centre). We wrote this guide to answer those questions, and to help people choose an appropriate repository for their project.
Which repository is right for your project?
The first step when choosing a repository is to list your requirements. To help with this process, we have listed the factors that you should consider at the end of this guide. The next step, is to decide whether to use a hosted service, an institutional repository or to run the infrastructure yourself.
What hosted services are available?
Hosted services are generally used when your software project is working with collaborators and committers spread across more than one institution. Some of the more popular public hosted services (according to the number of hosted users and projects) are listed below.
SourceForge (Supports: CVS/SVN/Git/Mercurial/Bazaar; Established: 1999; Users: >2m; Projects: >200k)
SourceForge is the best known software project hosting site, and also the largest. It provides most of the features you would expect from a repository and provides services to help recruit new developers. Some users have found that the server can be a little sluggish at times of high demand, and it is primarily supported by advertisements which may not be appropriate for all projects. SourceForge does not allow access from some countries, most notably Iran and Syria.
Google Code (SVN/Mercurial; 2005; ?; >250k)
Google provides Project Hosting. However, there are limits: only nine popular open-source licenses are available and you are limited to 25 projects per person. Google Code does not allow access from some countries, most notably Iran and Syria. The repository benefits from a large community since it is used to host most Google projects and the Google Summer of Code projects.
GitHub (Git, SVN experimental; 2008; >400k; >1.4m)
GitHub provides a more developer-focussed environment (as opposed to a project-focussed one). It is developing a strong following in the biosciences.
Codeplex (SVN/Mercurial; 2006; >150k; >15k)
Codeplex is hosted by Microsoft and is the base for many Windows and Ajax related projects.
Launchpad (Bazaar, CVS/SVN/Git/Mercurial import only; 2004; >1m; >15k)
Launchpad is hosted by Canonical and lists some significant projects as users, such as Ubuntu and MySQL. It provides a system (Blueprints) for feature and specifications tracking and the Soyuz release-management system.
Assembla (SVN/Git; 2005; >180k; >60k)
Assembla has a strong following amongst smaller companies and has extensive project-management facilities in addition to software-development services
Savannah (CVS/SVN/Git/Mercurial/Bazaar; 2000; ~50k; >3k)
Savannah hosts the majority of GNU software and some non-GNU software. Savannah's focus is on hosting for free software projects. To ensure that only free software is hosted, Savannah implements very strict hosting policies, including a ban against the use of non-free formats (such as Macromedia Flash).
There is a comparison of the features of many open source software hosting sites available on Wikipedia.
It is a general point that almost all repositories cater for open-source licensed projects. These sites are probably not suitable if you have a closed source code-base or a mixed licence product. In addition, you may find that the quality of service you receive is a trade-off between stability through lots of users and depersonalisation of the service.
One alternative is Bitbucket which allows for the hosting of both private and public repositories. It is a free service for under 5 users, with paid-for hosting plans used for larger projects.
There are also services provided for a particular large community. CCPForge provides a GForge-based repository primarily for the Collaborative Computational Projects (CCPs). This repository hosts both open-source and closed-source projects. The project must include a significant contribution from a UK research group and must be performing publicly funded scientific research. It is worth noting that, although CCPForge has a multiple back up policy, they do not guarantee safe storage of data.
Many organisations run their own version-control services, mailing-list managers and services that provide the full forge-like infrastructure. In general, these services are mainly useful if the committers and developers to your project are based at the organisation that hosts the service - although institutional repositories can usually handle a few external collaborators.
The main advantage of an institutional repository is that it is easy to work out who can help when you need something done. On the other hand, if your project has reached a truly global scale, it may not be appropriate for it to be tied to a specific institution (even if this is legally the case - see our guides on contribution licences).
Running your own infrastructure
It is relatively easy to setup and run your own revision-control system, such as CVS or SVN. It is also possible to run your own software repository using packages such as Trac, GForge, Savane (which powers SourceForge and Savannah), Codendi and LibreSource.
Running your own infrastructure requires a commitment of some time to setup and maintain the installation. However, it gives you the most control over the repository and its customisation. Typically, setting up your own repository is worthwhile if you are already running other infrastructure for your project and you are expecting to host more projects in the future.
Choosing a repository for your software project is not unlike choosing where to host a website. There are many options, from running it all yourself to paying for a fully hosted service. The option you choose will depend on your circumstances - particularly the functionality you require - the amount of effort required to manage the project, the popularity of the service amongst the community you work in, and the size and diversity of contributors to your project.
The most important point to keep in mind when choosing a repository, is that a repository only serves its purpose in the present. You must regularly review the provision from your repository in case you need to migrate to another service in the future.
Factors to consider when choosing a repository
What functionality do you need now?
Version-control system, including web interface for online code-browsing
Mailing lists, list management and archives
Basic web server for project/software pages
Software package hosting/publishing
Statistics reporting (e.g. number of commits, number of downloads)
Access Control (e.g. setting up project level roles)
- How easy is it to upgrade to additional functionality in the future?
- What is your preferred version-control system, e.g. CVS, SVN, Git, Mercurial?
- Is it important to have your code publicly available?
- Are all your code committers local?
- How easy is it to integrate other things you run separately (e.g. a website) with the repository?
- How good is the support for your IDEs of choice?
- Is there support for authentication systems such as OpenID or SSH keys?
- What additional forge, social networking, project-management functionality do you want from the site? e.g. GitHub is good for social coolness
- Where are similar projects to yours hosted?
- What's the speed of upload/download?
- How easy is it to backup the entire repository (code, mailing lists, issue tickets, ...)
- How established and stable is the repository?
- How good is the user support?
- How much effort do you have to put into repository maintenance?
- Would it be better to use more than one repository, e.g. code stored in GitHub and a link to Assembla for its extra tools?
- What are the Service Level Agreements for uptime, downtime, time to fix outages and bandwidth?
If you are trying to migrate your project from one repository to another, you might also want to consider two extra factors:
- How easy will it be to transfer not just your code, but your community, to the new site, e.g. do you have mailing list archives, wikis, user accounts
- Do you need to keep the revision history associated with your code, or can you start afresh?
Last updated: Tuesday 9 August 2011.