Developments in next generation sequencing – a visualisation
December 3, 2012
With this post I present a figure I’ve been working on for a while now. With it, I try to summarise the developments in (next generation) sequencing, or at least a few aspects of it. I’ve been digging around the internet to find the throughput metrics for the different platforms since their first instrument version came out. I’ve summarised my findings in the table at the end of this post. Then, I visualised the results by plotting throughput in raw bases versus read length in the graph below.
I think this visualisation nicely summarises what has happened over the past 7 years, since the GS 20 from 454 came on the market in 2005. For example, one could look at the results like this:
The cutoffs between the different classes are purely subjective. But they do show that the ‘intermediate’ class is there the fiercest competition is happening, at least between Illumina (MiSeq) and Life Technologies (Ion PGM and Proton).
- although I took utmost care in collecting the data, I may have gotten some of my numbers completely wrong, for which I apologize in advance; please help me correct any mistakes or omissions through leaving a comment
- some data was obtained by going to previous versions of company websites through the Internet Archive
- I used full single-run specs with maximally stated throughput whenever appropriate
- sometimes, the total numbers of reads per full run and total bases obtained do not match up; for the figure, I always chose the reported throughput in bases
- for Illumina, I chose to use the single-end read length, although the maximum throughput was based on the sum of all reads from a paired end run; I felt it unfair to double the read length for this platform for the figure
- the 300 bp kits from Ion Torrent for the PGM are not taken into account here, as these are not listed under the specs for this platform at the time of writing
- Pacific Biosciences does not seem to report per-SMRTCell metrics on their website, so I used the throughput specs as we have been told them by the company instead; this means that the announced upgrade (C2 XL, average read length 4.3 Kbp) has not yet been taken into account
- although some users report running the HiSeq at 2 x 150 bp, this is not supported (listed) on the Illumina website, so I stuck with 100 bp
- ‘HiSeq’ stands for both HiSeq 2000 and HiSeq 2500, as according to the Illumina website, these instruments give the same maximal throughput
- the figure was mashed together using MS Excel and Powerpoint – for my lack of a being able to use a program that does a nicer job; if you want to help me out giving the figure a more professional look, let me know!
Here is the ‘raw’ data – please help me improve the table through the comments section.
EDIT: the table was reproduced ‘in a web-friendly tabular form’ on the next gen seek blog.