“Loman et al reflects the past, not the present” says Life Technologies/ Ion Torrent in a slide set accompanying a response, published yesterday, to the recent paper by Nick Loman et al, “Performance comparison of benchtop high-throughput sequencing platforms” (Loman et al, 2012). See also my coverage of this paper in my previous blog post.
It is a critique I have read and heard more often: the data used for the analyses in the Loman et al paper is already old, as the technologies have now improved. This is of course true, particularly so for Ion Torrent. However true, it is not a fair critique. Researchers, and Nick Loman and yours truly are not an exception, are bound by the ‘publish or perish’ mantra. We are dependent on publishing peer-reviewed articles for obtaining grants, establishing our reputation, and for getting our next job. Peer review takes time: “Right now the time lag between finishing a paper, and the relevant worldwide research community seeing it, is between 6 months and 2 years.” (source). Nick’s paper was ‘Received 19 December 2011″, “Accepted 30 March 2012″ and finally “Published online 22 April 2012″. This is actually quite fast, taking into consideration the authors developed numerous new tools for the analyses (see the github repository accompanying the paper).
Sure, we can use a blog to circumvent the time delay, and publish a finding immediately, something Nick is actively doing (as am I through this blog). But, sometimes we need to go the peer-review route, for the reasons explained above.
It is therefore unavoidable that articles, like the one from Nick Loman, contain ‘old’ data. Heck, there are still lot’s of papers coming out based on 454 GS FLX and Illumina GA II(x) data. In addition, there must be many groups analysing IonTorrent/454/Illumina data they obtained using the same ‘generation’ of kits as were used for the Loman et al paper. These people will absolutely want to know about the different error types and accuracy levels. At the very least, the data presented in the paper give an overview of the relative performance at the time of studying, which might reflect on today’s performance.
I appreciate Ion Torrent people jumping on the occasion, requesting a sample from the strain used for the paper, sequencing with their latest chemistry, and redoing some of the analyses, all in less than three weeks. The results look promising, although they need to be quality checked by the community (bloggers like us, I guess
). But don’t blame the messenger for taking the established route. Let’s rather congratulate Nick Loman et al for a job well done and a well deserved publication in Nature Biotechnology!
Agree? Disagree? Feel free to drop me a comment below!
Nick Loman was kind enough to give me an advanced copy of his paper in Nature Biotechnology entitled “Performance comparison of benchtop high-throughput sequencing platforms” (Loman et al, 2012). I thought to present a quick summary of the paper here and add some comments of my own.
The paper sets out to “compare the performance of three sequencing platforms [Roche GS Junior, Ion Torrent PGM and Illumina MiSeq] by analysing data with commonly used assembly and analysis pipelines.” To do this, they chose a strain from the outbreak of food-borne illness caused by Shiga-toxin-producing E. coli O104:H4, which caused a lot of trouble in Germany about a year ago. The study is unique in that it is focuses on the use of these instruments for de novo sequencing, not resequencing.
First, they used the ‘big brother’ of the GS Junior, the GS FLX, to generate a reference genome (combining long reads obtained using the GS FLX+, and mate pairs using Titanium chemistry). Then, the same strains were sequenced on the benchtop instruments, and these reads were compared to the reference assembly. The reads were both compared directly, and after assembly with a few commonly used programs.
Filed in Bioinformatics, Next Generation Sequencing
Tags: 454, assembly, GS Junior, homopolymer, Illumina, Ion Torrent, Ion Torrent PGM, MiSeq, newbler
(The impatient reader might want to skip to the conclusion at the end of this post…)
Last wednesday, Ion Torrent released a tech note and associated run data with shotgun (single-end) and Mate Pair runs for Escherichia coli K12, substrain MG1655. Both a 3.5 kb and 8.9 kbp insert size, as well as a shotgun library, were sequenced on a 316 chip each. In the tech note, they describe assemblies using different combinations of the data, and show how adding the mate pairs yields assemblies with fewer scaffolds and gaps. The Ion mate-pair protocol is very similar to the one used by 454 Life Sciences for their (unfortunately called) Paired-end libraries: long fragments are circularized using a linker sequence, and sequencing is peformed across this linker, allowing for easy identification of the pair halves.
This is the first real ‘long-distance’ Mate Pair data from Ion Torrent, which is exciting and made me have a close look at it. I was especially interested in how the newbler program, developed by 454 Life Science for their 454 reads, would perform on these data.
Filed in Bioinformatics, Next Generation Sequencing
Tags: assembly, Ion Torrent, newbler
Each sequencing company has a workhorse genome they sequence a lot. PacBio sequences the lambda virus, Illumina uses PhiX. Both Ion Torrent and 454 use E. coli DNA, but while Ion Torrent takes E. coli K12 substr. DH10B, 454 chose E. coli K12 substr. MG1655.
I am interested in finding out to what extent Ion reads can replace the much more expensive 454 reads for de novo genome assembly (my field of speciality). Currently, the Ion read length is too short for the technology to be competitve, but this might change later this year, when (if …) the promised 400 bp reads become a reality.
In my comparisons of Ion data with 454 reads, I have always been hampered by the the fact that the strains the platforms use as test samples were not completely identical. Luckily for me, today Ion Torrent released (behind the Ion Community login) a dataset on E coli MG1655, a run with ID BEL-335. Imagine my joy! (Saves me from having our centre generate one on our – yet to be unpacked- PGM). The data is from a 314 chip, has 468 thousand reads and 54 Mbp raw data. That represents around 11x coverage of the MG166 genome. This is a bit too low for what I ideally would like to have (around 30 x), but alright. I set out to try this data in a de novo assembly using newbler, together with an equivalent data set of 454 reads.
IonTorrent: longer reads, longer homopolymers?
August 10, 2011
A quick summary for the impatient:
An analysis of the homopolymer distribution of the recently released ‘longer’ Ion Torrent reads indicates a possible significant over-calling of homopolymer lengths towards the ends of the reads. Trimming the ends off, however, only marginally improved de novo assembly of the reads using newbler.
Life recently released ‘long’ IonTorrent reads (B14_387, resequencing of E coli strain DH10B, available through the Ion Community here). There is an accompanying application note that brags about the read’s accuracy, especially over reads from the MiSeq platform. These accuracy measurements are logically based on alignment to a reference genome.
But what about de novo assembly? Thing is, the dataset presented, with a peak length of 241 (see below) and 350 000 read, is quite similar to what a full plate of GS FLX gave you in 2007 (peak length 250 bases, 400 000 reads). And 454 reads were very useful at the time for de novo assembly (in fact, the only reads available for this purpose, obviously besides Sanger reads).
Filed in Bioinformatics, Next Generation Sequencing
Tags: 454, homopolymer, Ion Torrent
Looking at the discussions on the Ion Community website, I came across an entry that mentions something interesting about the flow order. For both 454 and Ion torrent, sequencing happens by flowing one dNTP (base) at a time over the template. For each read, one or more of these bases gets incorporated, or none at all (see also an entry on this at my other blog).
454 has been using the same flow order since the beginning: TACG. This can be seen from the ‘header’ part of the sff file, which lists the flow order under ‘Flow Chars’ (see here and here for examples).
The first Ion Torrent runs on the 314 chip used the exact same flow order as 454. The Ion Community entry I mentioned explains how for the 316 chip, for which the first data were released not too long ago on the Ion Community website, an entirely different flow order was used. Instead of a four-base repeated cycle, the following 32 base (!) sequence was used repeatedly:
TACGTACGTCTGAGCATCGATCGATGTACAGC
Why would this ‘weird’ flow order be used? The different flow order apparently helps to remove incomplete extension, yielding longer read lengths. Incomplete extension happens when a subset of the template molecules on a single bead wrongly does not incorporate a base, making them out of sync with the rest of the molecules, and causing noise during later flows. The new flow order allows for these molecules to ‘catch-up’, so the different template molecules are better synchronized. A drawback of the 32-base flow order is the (on average) lower number of incorporations per flow, meaning more flows are needed for the same read length.
It looks like Ion is experimenting with other flow orders to get even better results. Now there is something 454 might give a try (although Life probably has or will take a patent out on this)…
For those of you who have access, the post is located here, and another one here.
For three platforms, reads longer than the commercially available, and/or from not-yet released instruments, have become accessible online. With online, I mean that we all can download these data to have a look at:
1) MiSeq 2x 150 bases runs
As part of the German E. Coli (EHEC) ‘Crowdsourcing Project’, Illumina sequenced fie strains for the UK Health Protection Agency, the fastq files can be downloaded from http://www.hpa-bioinformatics.org.uk/lgp/genomes. These are the first data in the public domain from a MiSeq!
See also this post on GenomeWeb.
2) IonTorrent 316 chip
Keith Robison shares a bit of info on data from an Ion 316 chip ion his ‘Omics! Omic!’ blog: “1.69M reads, with 1.53M of those >=50 bp long and 1.07M 100bp or longer”:
I downloaded the run files, and quickly looked at the read length distribution of the trimmed reads in the sff file (which listed 260 flows, 40 more than the file I analyzed in my previous post), showing a peak exactly one base longer at 109 bases. So, many more reads but not much gain in length (yet). Note the strange shape of the peak:
3) 454 GS FLX+
As part of the assemblathon2 (a de novo assembly competition), there have been released the first GS FLX+ reads (from a parrot), peak read length around 736 bases: http://bioshare.bioinformatics.ucdavis.edu/Data/hcbxz0i7kg/Parrot/. Those are at Sanger read length, now!
Now I need to find the time to have a look at these data!
Filed in Next Generation Sequencing
Tags: 454, GS FLX+, Illumina, Ion Torrent, MiSeq
There is more (length) to Ion Torrent reads than meets the eye (and is Ion Torrent hiding it?)
June 16, 2011
A quick summary for the impatient:
The sff files from the E coli Ion Torrent runs released by EdgeBio show much longer raw reads than the trimmed reads in the corresponing fastq/fasta files. The quality of those extra bases, however, is very low. This shows the potential for longer reads from the Ion Torrent platform.
The sff file released by Ion Torrent through their Dev Community site has these extra bases masked, which makes one wondering what if are trying to hide something…
Part 1: EdgeBio’s data
When EdgeBio released six runs with E coli DH10B Ion Torrent data (see http://www.edgebio.com/blog/?p=191), I decided to have a look inside the sff files they provided. I downloaded the data from data.edgebio.com, and used Roche’s sffinfo command to ‘peek inside’. The sffinfo command, accompanying the 454 Life Science software suite, will list the content of the binary sff file in text format (see the post on my other blog). Other, open source/access tools, such the ones my mention on my blog, might do this as well.
Read the rest of this entry »


