Earlier this week, the first paper was published describing the use of Oxford Nanopore MinION data to solve a biological question. The paper, entitled “MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island” came out in Nature Biotechnology (ReadCube link).
I was a reviewer for this manuscript. I have posted my two (signed) review reports on publons. As data and code were made available by the authors (as it should be), I made a (mostly successful) effort to reproduce the computational part of the paper. After I was done with the review report of the second version I could not help myself to have a further look at some of the results. This led to me sending some plots to the authors, and one of these plots ended up becoming figure 1. This was a lot of fun to see in the final version.
Two days ago, a paper appeared in Nature Scientific Data by Kristi Kim et al, titled “Long-read, whole-genome shotgun sequence data for five model organisms”. This paper describes the release of whole-genome PacBio data by Pacific Biosciences and others, for five model organisms, Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster, using quite recent chemistries.
Beyond the datasets described in the paper, Pacific Biosciences also released whole-genome data for the human genome, and very recently, for Caenorhabditis elegans using the latest P6/C4 chemistry. Check out PacBio devnet, also for data for other applications.
I think it is fantastic that Pacific Biosciences releases these datasets as a service to the community – and obviously to showcase their technology. Company-generated data often represents the best possible data, as it is done by people with very much experience with the technology. It remains to be seen if ‘regular’ owners of PacBio RS II instrument can reach the same level of data quality. Nonetheless, these datasets are very helpful for teaching (see my previous blog post), comparisons with other technologies (I wish a I could make time to throughly compare PacBio data to Moleculo data available from the same species), as well as development of new software applications.
We recently had the third instalment of the course in Throughput Sequencing technologies and bioinformatics analysis. This course aims to provide students, as well as users of the organising service platforms, basic skills to analyse their own sequencing data using existing tools. We teach both unix command line-based tools, as well as the Galaxy web-based framework.
I coordinate the course, but also teach a two-day module on de novo genome assembly. I keep developing the material for this course, and am increasingly relying on material openly licensed by others. To me, it is fantastic that others are willing to share material they developed openly for others to (re)use. It was hugely inspiring to discover material such as the assembly exercise, and the IPython notebook to build small De Bruijn Graphs (see below). To me, this confirms that ‘opening up’ in science increases the value of material many orders of magnitude. I am not saying that the course would have been impossible without having this material available, but I do feel the course has become much better because of it.
‘Open’ made this course possible
This course used:
openly available sequencing data released by the sequencing companies (although some of the Illumina reads are behind a – free – login account)
sequencing data made openly available by individual researchers
code developed for teaching made available by individual researchers under a permissive license
As it is out in the open that I was one of the reviewers of the ‘HGAP’ paper, I though I could as well make my review publicly available.
I have posted the review report (from February 2013) online at Publons. The review was actually done together with a PhD student in the group, Ole Kristian Tørresen (I like to do reviews together with others, it leads to better reviews and is a great learning experience for students!).
Last month, a new paper appeared in BMC Bioinformatics, entitled “Automated ensemble assembly and validation of microbial genomes”. In it, the authors describe iMetAMOS, a module of the metAMOS package, for bacterial genome assembly. I was one of the reviewers (I signed my review), and post part of my review here. The full review can be found on publons.
iMetAmos workflow. From the paper, doi:10.1186/1471-2105-15-126
I signed my review because I believe in non-anonymous peer review (see Mick Watson’s “reviewer’s oath”).
I made my review available on publons, a platform to post pre- and post-publication peer-review reports after the article has been published, because I believe in open peer-review. EDIT Adam Phillippy, the senior author on the paper, posted the authors response to the review reports they received as a comment to review on publons!
I post the first part of my review here because it nicely summarises the paper and my (favourable) opinion of it. I’ll admit that I wrote these paragraphs of the review report with the idea of posting them to my blog :-)
Earlier this year, I started a petition to ask Roche/454 Life Sciences to make the Newbler software (gsAssembly, gsMapper and Amplicon Variant Analyzer) open source. See this post for the background to the petition.
Source: Wikimedia Commons, by Marcus Quigmire
When I closed the petition, 162 people had signed it, see the PDF on figshare. During the Advances in Genome Biology and Technology (AGBT) meeting in Florida, I handed over the results of the petition to two Roche representatives, Dan Zabrowski, Head of Roche Sequencing Unit and Paul Schaffer, Vice President of Roche 454 Sequencing Business, see my blog post on the conversation I had with them.
Dan Zabrowski and Paul Schaffer promised me an official Roche response, and here it is (exclusively released through this blog): Continue reading →