Our review of “Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data”, aka the HGAP paper

As it is out in the open that I was one of the reviewers of the ‘HGAP’ paper, I though I could as well make my review publicly available.

I have posted the review report (from February 2013) online at Publons. The review was actually done together with a PhD student in the group, Ole Kristian Tørresen (I like to do reviews together with others, it leads to better reviews and is a great learning experience for students!).

Here are the first few paragraphs. Enjoy!

Continue reading

De novo bacterial genome assembly: a solved problem?

Pacific Biosciences published a paper earlier this year on an approach to sequence and assemble a bacterial genome leading to a near-finished, or finished genome. The approach, dubbed Hierarchical Genome Assembly Process (HGAP), is based on only PacBio reads without the need for short-reads. This is how it works:

  • generate a high-coverage dataset of the longest reads possible, aim for 60-100x in raw reads
  • pre-assembly: use the reads from the shorter part of the raw read length distribution, to error-correct the longest reads, set the cutoff in such a way so that the longest reads make up about 30x coverage
  • use the long, error-corrected reads in a suitable assembler, e.g. Celera, to produce contigs
  • map the raw PacBio reads back to the contigs to polish the final sequence (rather, recall the consensus using the raw reads as evidence) with the Quiver tool

The approach is very well explained on this website. As an aside, the same principle can now be used with the PacBioToCA pipeline.

Continue reading