My review of “Automated ensemble assembly and validation of microbial genomes”

Last month, a new paper appeared in BMC Bioinformatics, entitled “Automated ensemble assembly and validation of microbial genomes”. In it, the authors describe iMetAMOS, a module of the metAMOS package, for bacterial genome assembly. I was one of the reviewers (I signed my review), and post part of my review here. The full review can be found on publons.

iMetAmos workflow. From the paper, doi:10.1186/1471-2105-15-126

iMetAmos workflow. From the paper, doi:10.1186/1471-2105-15-126

I signed my review because I believe in non-anonymous peer review (see Mick Watson’s “reviewer’s oath”).

I made my review available on publons, a platform to post pre- and post-publication peer-review reports after the article has been published, because I believe in open peer-review. EDIT Adam Phillippy, the senior author on the paper, posted the authors response to the review reports they received as a comment to review on publons!

I post the first part of my review here because it nicely summarises the paper and my (favourable) opinion of it. I’ll admit that I wrote these paragraphs of the review report with the idea of posting them to my blog 🙂

Enjoy!


The field of genome assembly has seen some interesting development recently. First, we have seen several assembly competitions (Assemblathon 1&2, GAGE and GAGE-B) often using genomes with an available reference, where no real winners could be called. Second, new applications became available for reference-free evaluation of assemblies, based on the reads that were used for generating the assembled contigs and scaffolds. Conclusions I drew from these efforts, and our own work, are that each assembly project is unique and should be performed as a mini-assemblathon, followed by carefully selecting the best assembly for the goal of the project. Or, in other words, running multiple assemblers and using reference-free validation tools to judge them. It was therefore with great pleasure that I saw the pre-print for the iMetAMOS paper, and also the reason I happily agreed to review the manuscript.

iMetAMOS aims to combine these two steps of assembly using multiple programs, with a validation by employing again multiple tools, and then choses an assembly based on these evaluations. In addition, iMetAMOS performs a standard contamination check – or rather, is treats each assembly as a metagenomic one, determining the possible phylogenetic origin of all contigs/scaffolds. This makes it a unique tool, on that has not been available until now. I wholeheartedly support this effort.

The manuscript was a pleasure to read, easily understandable and short. I have mostly small comments, see below. iMetAMOS incorporates a fantastic suite of current state-of-the-art open source tools, and should perform well straight out of the box. I didn’t consider anything missing that would be essential for its goals. Well, may be one. Perhaps a future version could add an assembly polishing/improvement tools such as Pilon (from the Broad Institute, http://www.broadinstitute.org/software/pilon/).

In addition, I appreciate the possibility that user can add their own tool if they want (lack of time prohibited me from trying this out). I was able to successfully download, install and use the single file binary, albeit with some hiccups (see below). The quick test using a lambda phage dataset finished, as did the running of the ERR234097 example mentioned on http://www.cbcb.umd.edu/software/imetamos. However, for ERR234097, I did not obtain the results mentioned in the supplementary excel file, see below.

An off-the-shelf tool like iMetAMOS is great for push-button bioinformatics. However, it allows the user to get away with not fully understanding the tools, parameters etc of the workflow. Here, we have to trust the authors to have done a thorough job, reporting on the choices made, and enabling thorough evaluation of the underlying material. First, the authors have a long track record in the field and I fully trust they know what they are doing. Second, all intermediate results are saved in the project directory, and all steps are logged in extensive log files. So, an interested user can (and, I would say, probably should) go and inspect each step of the procedure. If anything, there is an overload of information, making finding details a bit hard.

In general, the authors seem to aim to make life easier for scientists with little bioinformatics experience. For example, making available a single binary, with all dependencies included, takes the burden away from having to download and install a lot of software. The program generates a single html page at the end as point-of-entry for evaluating the results, and downloading the assembly. All these aspects I applaud. However, there are a few improvements possible here. Mainly, the documentation could be improved. The ‘readthedocs’ pages (http://metamos.readthedocs.org/en/latest/index.html) have very little information on how to run the software. Installation instructions assume one knows how to extract a tar.gz file. The github pages https://github.com/marbl/metAMOS are more detailed, but, as an example, lack a description of what the different steps of the workflow do (Preprocess, Assemble, FindORFs, …). At the end of running the software, unless you know where to look, it is not straightforward to find the html pages generated. So in general, the software could be made easier to use for a beginner. (there are also three entry points, readthedocs, github and the http://www.cbcb.umd.edu/ pages) which I found confusing.

In conclusion, I would recommend iMetAMOS to users, both those with little bioinformatics experience, as well as super users, wanting to perform bacterial genome assemblies using short-read sequencing data.


For the complete review report, see https://publons.com/review/3544/.

Leave a comment