homopolymer | In between lines of code

Nick Loman was kind enough to give me an advanced copy of his paper in Nature Biotechnology entitled “Performance comparison of benchtop high-throughput sequencing platforms” (Loman et al, 2012). I thought to present a quick summary of the paper here and add some comments of my own.

The paper sets out to “compare the performance of three sequencing platforms [Roche GS Junior, Ion Torrent PGM and Illumina MiSeq] by analysing data with commonly used assembly and analysis pipelines.” To do this, they chose a strain from the outbreak of food-borne illness caused by Shiga-toxin-producing E. coli O104:H4, which caused a lot of trouble in Germany about a year ago. The study is unique in that it is focuses on the use of these instruments for de novo sequencing, not resequencing.

First, they used the ‘big brother’ of the GS Junior, the GS FLX, to generate a reference genome (combining long reads obtained using the GS FLX+, and mate pairs using Titanium chemistry). Then, the same strains were sequenced on the benchtop instruments, and these reads were compared to the reference assembly. The reads were both compared directly, and after assembly with a few commonly used programs.

Continue reading →

A quick summary for the impatient:

An analysis of the homopolymer distribution of the recently released ‘longer’ Ion Torrent reads indicates a possible significant over-calling of homopolymer lengths towards the ends of the reads. Trimming the ends off, however, only marginally improved de novo assembly of the reads using newbler.

Life recently released ‘long’ IonTorrent reads (B14_387, resequencing of E coli strain DH10B, available through the Ion Community here). There is an accompanying application note that brags about the read’s accuracy, especially over reads from the MiSeq platform. These accuracy measurements are logically based on alignment to a reference genome.

But what about de novo assembly? Thing is, the dataset presented, with a peak length of 241 (see below) and 350 000 read, is quite similar to what a full plate of GS FLX gave you in 2007 (peak length 250 bases, 400 000 reads). And 454 reads were very useful at the time for de novo assembly (in fact, the only reads available for this purpose, obviously besides Sanger reads).

Continue reading →

In between lines of code

Biology, sequencing, bioinformatics and more

Tag Archives: homopolymer

Fast genome sequencing of pathogenic bacteria – which benchtop instrument to choose?

IonTorrent: longer reads, longer homopolymers?