Loman et al reflects the past, not the present – a rebuttal

“Loman et al reflects the past, not the present” says Life Technologies/ Ion Torrent in a slide set accompanying a response, published yesterday, to the recent paper by Nick Loman et al, “Performance comparison of benchtop high-throughput sequencing platforms” (Loman et al, 2012). See also my coverage of this paper in my previous blog post.

Image credit: technorati.com http://bit.ly/uSYZIb

It is a critique I have read and heard more often: the data used for the analyses in the Loman et al paper is already old, as the technologies have now improved. This is of course true, particularly so for Ion Torrent. However true, it is not a fair critique. Researchers, and Nick Loman and yours truly are not an exception, are bound by the ‘publish or perish’ mantra. We are dependent on publishing peer-reviewed articles for obtaining grants, establishing our reputation, and for getting our next job. Peer review takes time: “Right now the time lag between finishing a paper, and the relevant worldwide research community seeing it, is between 6 months and 2 years.” (source). Nick’s paper was ‘Received 19 December 2011″, “Accepted 30 March 2012” and finally “Published online 22 April 2012”. This is actually quite fast, taking into consideration the authors developed numerous new tools for the analyses (see the github repository accompanying the paper).

Sure, we can use a blog to circumvent the time delay, and publish a finding immediately, something Nick is actively doing (as am I through this blog). But, sometimes we need to go the peer-review route, for the reasons explained above.

It is therefore unavoidable that articles, like the one from Nick Loman, contain ‘old’ data. Heck, there are still lot’s of papers coming out based on 454 GS FLX and Illumina GA II(x) data. In addition, there must be many groups analysing IonTorrent/454/Illumina data they obtained using the same ‘generation’ of kits as were used for the Loman et al paper. These people will absolutely want to know about the different error types and accuracy levels. At the very least, the data presented in the paper give an overview of the relative performance at the time of studying, which might reflect on today’s performance.

I appreciate Ion Torrent people jumping on the occasion, requesting a sample from the strain used for the paper, sequencing with their latest chemistry, and redoing some of the analyses, all in less than three weeks. The results look promising, although they need to be quality checked by the community (bloggers like us, I guess 🙂 ). But don’t blame the messenger for taking the established route. Let’s rather congratulate Nick Loman et al for a job well done and a well deserved publication in Nature Biotechnology!

Agree? Disagree? Feel free to drop me a comment below!

9 thoughts on “Loman et al reflects the past, not the present – a rebuttal

  1. Agree. Seems any time I’ve had an issue with Ion Torrent, they are quick to suggest that I need to use the most current version, rather than explaining what the problem was with an earlier one. We can’t get reagents, use them, analyze the data, and report on it in a three week interval.

  2. Totally agree here. We have ordered Ion PGM without seeing any raw data despite repeated asking for and without knowing the detail procedure last year. Yes, we did develop a working assay for somatic mutation detection by ignoring any single base indels associated with homopolymer. Also we realized we have to spend 3 hr for emulsion PCR then 2 hr to initiate PGM every day or every two runs. Conveniently IT just mentioned 2 hr sequencing time for PGM. Using the 200 bp v2 chemistry and the latest software, what we saw agrees with that in Loman’s paper. For PGM read, Q scores drop to 100 variant calls were called (>90% associated with homopolymers, 2-mer, 3-mer, 4-mer, etc.). There are also issues with one directional reads, loss of coverage due to shorter reads after base quality trimming. Without bidirectional reads, even SNV calls are not certain. The only hope is paired end reading. However, we are talking about a work flow like: finishing one read, removing chip for treatment, reloading and finishing the 2nd read.

    Then MiSeq gives us far simple workflow (20 min hands-on replacing all steps of emPCR and machine initiation), far more throughput (e.g. 1.8 Gb of >90% Q30 reads), mostly near full length reads, far fewer variant calls. Talking about false positives MiSeq have, only one recurrent homopolymer-associated false positive (>7 Nt), two recurrent unbalanced false positives associated with high GC, which are easily to deal with.

    If you are interested in detecting low level SNV, mutations in tumor suppressor genes, which may be associated with frame shifts, MiSeq is far better choice than PGM.

  3. Respectfully disagree. I had first hand experiences on both sides of “publish or perish” during my academic days, and know exactly what you mean. But let’s not forget, before Aug 2011 both BGI and Life Tech had sequenced the Germany outbreak E. coli strain on the Ion Torrent platform, with raw reads and assembly already released to the public. Those runs have lower overall error rates or higher consensus accuracy than the later datasets the Nature BioTech authors chose to use for the paper. I don’t see how the need of expediting publication fits in here.

    Also the MiSeq data was generated around June 2011 and subsequently trimmed to reduce raw error rate to 0.1%. In a response published on GenomeWeb, the leading author insists MiSeq data by other labs now “shows the performance is at least as good as reported in our paper”. No other MiSeq customer ever publicly claimed their reads are >99.9% accurate. None of the MiSeq runs in SRA can be unambiguously and efficiently trimmed the same way as in the paper. After nearly a year, official MiSeq web page still states “data quality is comparable between the MiSeq and HiSeq”, while HiSeq is known to be far less accurate than 99.9%.

    As a long-time reader of this blog and first-time commenter, many thanks to Lex for sharing your thoughts and promoting discussions on the topic. I do work in Life Technologies R&D so maybe biased sometimes. But I made the recommendation for the new Ion sequencing kit only because it was expected to solve the specific customer’s problem. I would like to know whether it indeed worked or not.

      • Sorry Lex. I didn’t mean to make it sound like targeting at you. It is only remotely relevant to the discussion.

  4. Regarding base quality trimming and sequencing accuracy, I speak based on my own experience with targeted resequencing human cancer using both Ion PGM and MiSeq platforms and the same 3rd party aligner and variant caller. To say MiSeq data have been heavily trimmed to achieve high accuracy is not true. If PGM data had been subject to the same level of quality trimming, there would be far lower depth of coverage left. For SNP detection, PGM may be OK. For deep sequencing for lower percentage of mosaic cancer mutations, PGM is only useful for situations homopolymer errors could be ignored. Unfortunately, this does not apply to recessive tumor suppressor genes.

    • Wendell, many “3rd party aligner and variant caller”s are tuned to handle error modes in Illumina data which dominate the market in the early days. I wonder if you see similar results from the Ion variantCaller. In your MiSeq fastq file, are the trimming markers (‘B’s) scattered everywhere, instead of being clustered at ends as in the vendor-showcased dataset? I have only looked at SRA so far and would love to confirm with a real customer.

      • Yutao:
        It might be true that 3rd party aligner and variant callers handle Illumina data better.
        Since you work for Life Tech, perhaps you could explain or confirm how Ion Variant Caller works better in dealing with PGM specific errors. My guess: by using SFF file for alignment and variant calling, additional information beyond quality scores, such as signal intensity, flow orders, etc. could be used to recalibrate base calls and/or filter out variant calls associated with homopolymers. Maybe a flow-based alignment algorithm is used with greater tolerance of variable hompolymer lengths. It could result in far fewer homopolymer errors (false positives). However I would also think it could result in false negatives. For known real indel variants within homopolymer stretch, e.g. 2184delA in CFTR, or potential indel within tumor suppressor genes, false negative errors by PGM could be a real issue. According to a latest report by Dr. Corless at OHSU, PGM missed the majority of a set of known indels in tumor samples using Ion VariantCaller. Perhaps future version and paired end reading could be the solution.
        Regarding MiSeq data, I see isolated low quality bases bases at various places with a little more near the 3′ end. However, with very deep sequencing, I have no issue with calling low % mutations. Even rare errors you revealed could be easily rejected by read uni-direction.

      • Thanks for the information Wendell. Talking about false negatives, official MiSeq website claims no indels were found in DH10B despite various mpileup parameters. In fact there is at least one indel error in the reference they used, confirmed by BLAST, Ion and SOLiD sequencing.

        I am not directly involved in developing Ion Variant Caller. But based on what I know, your guess is correct, and the version in v2.2 software should reduce false negatives. Local Ion FBS and Ion Community can help better understand your missing variants. I am sure my colleagues who are actively working on variant calling would love to learn from your cases on Ion Community.

Leave a comment