A Software Carpentry-inspired workshop to improve the way we do bioinformatics in our group

About a year ago, I attended a Software Carpentry Bootcamp. Software Carpentry aims ‘to make scientists more productive, and their work more reliable, by teaching them basic computing skills’. As I described in a previous blogpost, attending the bootcamp changed many aspects of the way I work. I also decided to become a Software Carpentry instructor. Together with Karin Lagesen, we recently taught a BootCamp in Oslo. Our group at the Centre for Ecological and Evolutionary Synthesis (CEES) is working on different aspects related to fish genomics. This more or less started with our leading role in the project for the sequencing and assembly of the genome of Atlantic cod. As part of that project, many group members had to learn basic unix shell commands, making small pipelines, running programs, and sometimes a bit of scripting. Now, we see many more people at the CEES are moving into bioinformatics, often as a result of them starting to get high-throughput sequencing data. Thus, we have many self-taught bioinformaticians at the centre, in other words, the ideal target audience for Software Carpentry.

To help the bioinformatics work in our group apply the principles of Software Carpentry, we are going to have an ‘extended’ workshop, spread over several weeks with one half-day session each week. In this post, I will present the intended subjects. Each one will be introduced by a teacher (some by invited persons, some by me), followed by practical application of the material, preferably on people’s own data. In between sessions, participants are suggested to do some exercises, and further try out the material so that it sinks in. I intend to report on our progress and experiences on this blog. For all modules, a prerequisite is comfortable knowledge of the unix shell. The exact order is still a bit up in the air. Feedback welcome (in the comments-section)!

  • Effective use of the shell
    • basic shell scripting
    • for loops
    • how to automate and abstract when possible and practicle
  • Version control I: basics
    • basics: setting up a repo, committing, diff, log, turning back the clock
    • setting up and using a personal (university) repository
    • using a collaborative repository
    • working with public repositories
  • Using ‘make’ to automate your work and make it more reproducible
    • basics of make
    • executing and developing your pipeline through a makefile
    • making your work reproducible with make
  • From complex shell command pipeline to python script
    • no python knowledge is necessary, but it surely will help. At the very least, this module will hopefully inspire you to learn (more) python
    • examples of unix pipelines from participants
    • rework(ed) into small python script and explained
  • IPython notebook
    • basic usage
    • reporting work in
    • sharing notebooks
  • How we organise the bin, src, scripts folders
    • cleanup and reorganisation of the bin folder
    • differentiate source code files, binaries and scripts
    • using the module system
  • Porting your analysis pipeline over to Galaxy
    • basics of Galaxy
    • making a workflow page to be distributed with your paper
  • Reproducible research
    • practical aspects of enhancing reproducible research
    • tools to help achieve reproducible research
    • sharing of code, workflows, data
  • Version control II: advanced and collaborative code development
    • branching and merging
    • pull requests

6 thoughts on “A Software Carpentry-inspired workshop to improve the way we do bioinformatics in our group

  1. I really like the curriculum! (Of cousre, since it overlaps to a large extent with what I find myself doing – but I sometimes feel I am the last remaining command line user, so it is reassuring to see there are more out there 🙂 I’m curious what are you using for version control? (I’ve been using darcs, which is very lightweight and easy – but I guess git is the way to go these days. Or?)

    Anyway, I think it is a great initiative, and am looking forward to hearing how it goes.

    • Thanks! We will use git for version control, because it quickly is becoming (has become?) the system of choice, and the University of Oslo offers local, shared, and public repositories.

  2. Sounds like a useful syllabus. We are covering similar steps in a Intro Bioinformatics class. Have you uploaded your slides to Github or slideshare or something similar?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s