
Data Carpentry (since 2014) is programme inspired by Software Carpentry. It is a sister organisation to Software Carpentry and shares much of its community and infrastructure. Data Carpentry programme teaches particular and recommended open source tools to do reproducible and scalable data analysis - how to retrieve, view, manipulate, analyse and store their or other people's data in an open and reproducible way and how to work with data more effectively.
Data Carpentry workshops focus on the data lifecycle, covering data organisation, cleaning and management through to data analysis and visualisation. Contrary to Software Carpentry, whose lessons are generic and domain-agnostic focusing on best practices in programming in general, Data Carpentry designs the workshops to fit into needs of particular domains and its lessons are domain-specific, with coverage in biology, genomics, and social science, and with lesson for new domains and disciplines being developed by the community (medical doctors, geography, humanities, etc.).
As with Software Carpentry programme, teaching is delivered through intensive two-day workshops. The core curriculum taught at Data Carpentry workshops includes:
- Caveats of working with spreadsheets
- Cleaning data with Open Refine
- Data manipulation and visualisation with R or Python
- Introduction to SQL and relational databases
- Automating repetitive task by working with UNIX shell.
The list is not exhaustive - all training materials are freely available under the Creative Commons - Attribution License from the Data Carpentry's lesson repository. Materials can be reused in any way you wish, without asking for special permission, provided that the original source is cited.
For more information about Data Carpentry or organising a Data Carpentry workshop in the UK, email us.