As before, full run throughput in gigabases (billion bases) is plotted against single-end read length for the different sequencing platforms, both on a log scale. Yes, I know a certain new instrument (different from last time) seems to be missing, hang on, I’m coming back to that…
Notable changes from the June 2015 edition
I added the Illumina MiniSeq
I added the Oxford Nanopore MinION. The read length for this instrument was based on the specifications for maximal output and number of reads from the company’s website. The two data points represent ‘regular’ and ‘fast’ modes.
I added the IonTorrent S5 and S5XL. You may notice that the line for this instrument has a downward slope, this is due to the fact that the 400 bp reads are only available on the 520 and 530 chip, but not the higher throughput 540 chip, making the maximum throughput for this read length lower than for the 200 bp reads.
As before, full run throughput in gigabases (billion bases) is plotted against single-end read length for the different sequencing platforms, both on a log scale. Yes, I know a certain new instrument seems to be missing, hang on, I’m coming back to that…
I attended, for the first time, the Advances in Genome Biology and Technology (AGBT) meeting in Florida. With this post, I intend to summarise my experiences of the meeting. I will not cover everything that happened at the meeting, but focus of the areas of my own interest.
With this post I present a figure I’ve been working on for a while now. With it, I try to summarise the developments in (next generation) sequencing, or at least a few aspects of it. I’ve been digging around the internet to find the throughput metrics for the different platforms since their first instrument version came out. I’ve summarised my findings in the table at the end of this post. Then, I visualised the results by plotting throughput in raw bases versus read length in the graph below.
A potential user (‘customer’) of our sequencing platform asked how to generate reference genomes for his 4 bacterial strains. His question inspired me to write this post. The suggestions below are not absolute, just my thoughts on how one these days could go about sequencing a bacterial genome using one or more of the sequencing platforms. I would appreciate any feedback/suggestions in the comments section!
Option 1: bits and pieces
Libraries: paired end or single end sequencing
Platform: one or more of Illumina MiSeq or HiSeq, Ion Torrent PGM, 454 GS FLX or GS Junior
Image from Wikimedia Commons. (Buzz was a low-cost airline based at London Stansted operating services to Europe. It was sold to Ryanair.)
I am not attending the American Society of Human Genetics meeting in San Fransisco, but can’t escape the buzz it creates on twitter (hashtag #ashg2012). Strikingly, it is almost another AGBT when it comes to announcements from companies selling sequencing instrument. All of them had something new to bring to the floor. This post summarizes what I picked up from twitter and a few websites, and I give a bit of my perspectives on the respective announcements. I am focussing on technology improvements, especially with regard to read lengths, not so much on applications such as cancer resequencing panels.
“Loman et al reflects the past, not the present” says Life Technologies/ Ion Torrent in a slide set accompanying a response, published yesterday, to the recent paper by Nick Loman et al, “Performance comparison of benchtop high-throughput sequencing platforms” (Loman et al, 2012). See also my coverage of this paper in my previous blog post.
It is a critique I have read and heard more often: the data used for the analyses in the Loman et al paper is already old, as the technologies have now improved. This is of course true, particularly so for Ion Torrent. However true, it is not a fair critique. Researchers, and Nick Loman and yours truly are not an exception, are bound by the ‘publish or perish’ mantra. We are dependent on publishing peer-reviewed articles for obtaining grants, establishing our reputation, and for getting our next job. Peer review takes time: “Right now the time lag between finishing a paper, and the relevant worldwide research community seeing it, is between 6 months and 2 years.” (source). Nick’s paper was ‘Received 19 December 2011″, “Accepted 30 March 2012” and finally “Published online 22 April 2012”. This is actually quite fast, taking into consideration the authors developed numerous new tools for the analyses (see the github repository accompanying the paper).
Nick Loman was kind enough to give me an advanced copy of his paper in Nature Biotechnology entitled “Performance comparison of benchtop high-throughput sequencing platforms” (Loman et al, 2012). I thought to present a quick summary of the paper here and add some comments of my own.
The paper sets out to “compare the performance of three sequencing platforms [Roche GS Junior, Ion Torrent PGM and Illumina MiSeq] by analysing data with commonly used assembly and analysis pipelines.” To do this, they chose a strain from the outbreak of food-borne illness caused by Shiga-toxin-producing E. coli O104:H4, which caused a lot of trouble in Germany about a year ago. The study is unique in that it is focuses on the use of these instruments for de novo sequencing, not resequencing.
First, they used the ‘big brother’ of the GS Junior, the GS FLX, to generate a reference genome (combining long reads obtained using the GS FLX+, and mate pairs using Titanium chemistry). Then, the same strains were sequenced on the benchtop instruments, and these reads were compared to the reference assembly. The reads were both compared directly, and after assembly with a few commonly used programs.
(The impatient reader might want to skip to the conclusion at the end of this post…)
Last wednesday, Ion Torrent released a tech note and associated run data with shotgun (single-end) and Mate Pair runs for Escherichia coli K12, substrain MG1655. Both a 3.5 kb and 8.9 kbp insert size, as well as a shotgun library, were sequenced on a 316 chip each. In the tech note, they describe assemblies using different combinations of the data, and show how adding the mate pairs yields assemblies with fewer scaffolds and gaps. The Ion mate-pair protocol is very similar to the one used by 454 Life Sciences for their (unfortunately called) Paired-end libraries: long fragments are circularized using a linker sequence, and sequencing is peformed across this linker, allowing for easy identification of the pair halves.
This is the first real ‘long-distance’ Mate Pair data from Ion Torrent, which is exciting and made me have a close look at it. I was especially interested in how the newbler program, developed by 454 Life Science for their 454 reads, would perform on these data.