As before, full run throughput in gigabases (billion bases) is plotted against single-end read length for the different sequencing platforms, both on a log scale. Yes, I know a certain new instrument seems to be missing, hang on, I’m coming back to that…
Notable changes from the June 2014 edition
- I added the Illumina HiSeq 4000
- the HiSeq 2500 Rapid Run upgraded to 2×250 bp read length
- PacBio upgraded to P6-C4 metrics
- read numbers (but not the other metrics) for the full PacBio runs were updated, as they previously reflected those for single SMRTCells
But, where is the Oxford Nanopore MinION?
The Oxford Nanopore MinION is a bit tricky. My metrics are based on company specifications that anyone can view from their website, for commercially released instruments and (chemistry) updates. The commercial release of the MinION seems now to have happened, as it was announced everyone can apply for the MinIon Access Program and will be accepted (barring some sanity screening they probably will do). But metrics for this instrument are a different matter. There are no company specs that I can find. Partly this is understandable, as read length, for example, is dependent on the input length of the sample (or library, rather). The other reason for the lack of specifications may be the Minion Access Programs philosophy of Oxford Nanopore. It is the users that are discovering what the instrument can do, rather than the company telling the customer what to expect.
For my visualisation, this lack of specification causes a problem, as this makes it impossible to point to a source for where to place the MinION on the plot. Hence it is lacking from the figure above.
As a (temporary?) solution, here is, purely based on what I read in articles, both published scientific articles, preprints, and webpages, where the MinION more or less belongs. Note that I choose to represent it with a cloud rather than a single, solid datapoint…
- note how close together the data points fall for the HiSeq 4000 and the HiSeq X
- [EDIT] another new ‘instrument’ is the HiSeq X Five, but the single instrument is the same as for the HiSeq X Ten, and so now new datapoint was generated for the X Five
- Illumina instruments clearly dominate the top middle part of the figure (between 100 and 300 bp, and 10 GB and 2 TB throughput)
- the Complete Genomics/BGI Revolocity is missing from the figure, also since there are no company specifications on the Complete Genomics website. There is as of yet not enough information to try to piece together what the metrics are, all that is known is that the read length is 28 bp paired end.
- [EDIT] a reader left a comment which I’d like to quote: “The Amersham/GE Healthcare MegaBACE 4500 was a 384-capillary instrument, with readlengths over 1000 bp. Yet, due to ABI’s better dominance in the market, the MB 4500 never had much penetration.”
- [EDIT] see also this reader comment on the history of the ABI 37* instruments. I’ve added 2002 as release date for the 3730xl as a result. It would be fun to try to dig up more information on the other instruments on the market before the 454 and Solexa ones came out…
- as mentioned in the original blog post: some data was obtained by going to previous versions of company websites through the Internet Archive
- I used full single-run specs with maximally stated throughput as available at the time of writing
- sometimes, the total numbers of reads per full run and total bases obtained do not match up; for the figure, I always chose the reported throughput in bases
- for Illumina, I chose to use the single-end read length, although the maximum throughput was based on the sum of all reads from a paired end run; I felt it unfair to double the read length for this platform for the figure
Data and figures are released under a CC0 license at figshare, with doi 10.6084/m9.figshare.100940. I’ve also added the content to Github at https://github.com/lexnederbragt/developments-in-next-generation-sequencing.
As before: although I took utmost care in collecting the data, I may have gotten some of my numbers completely wrong, for which I apologise in advance; please help me correct any mistakes or omissions through leaving a comment, or sending me a pull request.
Finally, the raw data
|Platform||Instrument||Year||Reads per run||Read length (mode or average)||Bases per run (gigabases)||Source|
|454||GS FLX Titanium||2009||1000000||500||0.45|
|IonTorrent||PGM 314 chip||2011||100000||100||0.01||3|
|IonTorrent||PGM 316 chip||2011||1000000||100||0.1||3|
|IonTorrent||PGM 318 chip||2011||5000000||100||0.5||3|
|IonTorrent||PGM 318 chip||2012||5000000||200||1||3|
|IonTorrent||PGM 318 chip V2||2013||5000000||400||2||12|
|Illumina||HiSeq 2500 RR||2012||600000000||150||180||13|
|Illumina||HiSeq 2500 RR||2014||600000000||250||300||13|
|PacBio||RS C2 XL||2012||432000||4300||1.858|
|PacBio||RS II C2 XL||2013||564000||4600||2.594||15|
|PacBio||RS II P5 C3||2014||528000||8500||4.500||15|
|PacBio||RS II P6 C4||2014||660000||13500||9.000||15|
 mode or average
 Sources: see this file from the github repo.