Developments in high throughput sequencing – June 2015 edition

This is the fourth edition of this visualisation, previous editions were in June 2014, October 2013 and December 2012.

As before, full run throughput in gigabases (billion bases) is plotted against single-end read length for the different sequencing platforms, both on a log scale. Yes, I know a certain new instrument seems to be missing, hang on, I’m coming back to that…

Continue reading

Developments in next generation sequencing – June 2014 edition

This is the third edition of this visualisation, previous editions were in October 2013 and December 2012.

As before, full run throughput in gigabases (billion bases) is plotted against single-end read length for the different sequencing platforms, both on a log scale:

Developments in next generation sequencing June 2014

 

 

 

 

 

 

 

 

 

 

 

 

 

Continue reading

Developments in next generation sequencing – a visualisation

With this post I present a figure I’ve been working on for a while now. With it, I try to summarise the developments in (next generation) sequencing, or at least a few aspects of it. I’ve been digging around the internet to find the throughput metrics for the different platforms since their first instrument version came out. I’ve summarised my findings in the table at the end of this post. Then, I visualised the results by plotting throughput in raw bases versus read length in the graph below.

Developments in next generation sequencing. http://dx.doi.org/10.6084/m9.figshare.100940

Developments in next generation sequencing. http://dx.doi.org/10.6084/m9.figshare.100940

Continue reading

How to sequence a bacterial genome at the end of 2012

A potential user (‘customer’) of our sequencing platform asked how to generate reference genomes for his 4 bacterial strains. His question inspired me to write this post. The suggestions below are not absolute, just my thoughts on how one these days could go about sequencing a bacterial genome using one or more of the sequencing platforms. I would appreciate any feedback/suggestions in the comments section!

Option 1: bits and pieces

  • Libraries: paired end or single end sequencing
  • Platform: one or more of Illumina MiSeq or HiSeq, Ion Torrent PGM, 454 GS FLX or GS Junior
  • Bioinformatics: assembly: Velvet, SOAPdenovo, Newbler, MIRA, Celera
  • Outcome: up to hundreds of short contigs (with only single-end reads) or contigs + scaffolds (with paired end reads)
  • Pros: fast and cheap, OK for presence/absence of e.g. genes
  • Cons: doesn’t give much insight into the genome
  • Remarks: due to per-run throughput, multiplexing is recommended; data can also be used for mapping against a reference genome instead

Continue reading

Fast genome sequencing of pathogenic bacteria – which benchtop instrument to choose?

Nick Loman was kind enough to give me an advanced copy of his paper in Nature Biotechnology entitled “Performance comparison of benchtop high-throughput sequencing platforms” (Loman et al, 2012). I thought to present a quick summary of the paper here and add some comments of my own.

The paper sets out to “compare the performance of three sequencing platforms [Roche GS Junior, Ion Torrent PGM and Illumina MiSeq] by analysing data with commonly used assembly and analysis pipelines.” To do this, they chose a strain from the outbreak of food-borne illness caused by Shiga-toxin-producing E. coli O104:H4, which caused a lot of trouble in Germany about a year ago. The study is unique in that it is focuses on the use of these instruments for de novo sequencing, not resequencing.

First, they used the ‘big brother’ of the GS Junior, the GS FLX, to generate a reference genome (combining long reads obtained using the GS FLX+, and mate pairs using Titanium chemistry). Then, the same strains were sequenced on the benchtop instruments, and these reads were compared to the reference assembly. The reads were both compared directly, and after assembly with a few commonly used programs.

Continue reading

More, longer, longest: new reads from three NGS platforms available online

For three platforms, reads longer than the commercially available, and/or from not-yet released instruments, have become accessible online. With online, I mean that we all can download these data to have a look at:

1) MiSeq 2x 150 bases runs
As part of the German E. Coli (EHEC) ‘Crowdsourcing Project’, Illumina sequenced fie strains for the UK Health Protection Agency, the fastq files can be downloaded from http://www.hpa-bioinformatics.org.uk/lgp/genomes. These are the first data in the public domain from a MiSeq!
See also this post on GenomeWeb.

2) IonTorrent 316 chip
Keith Robison shares a bit of info on data from an Ion 316 chip ion his ‘Omics! Omic!’ blog: “1.69M reads, with 1.53M of those >=50 bp long and 1.07M 100bp or longer”:
I downloaded the run files, and quickly looked at the read length distribution of the trimmed reads in the sff file (which listed 260 flows, 40 more than the file I analyzed in my previous post), showing a peak exactly one base longer at 109 bases. So, many more reads but not much gain in length (yet). Note the strange shape of the peak:

3) 454 GS FLX+
As part of the assemblathon2 (a de novo assembly competition), there have been released the first GS FLX+ reads (from a parrot), peak read length around 736 bases: http://bioshare.bioinformatics.ucdavis.edu/Data/hcbxz0i7kg/Parrot/. Those are at Sanger read length, now!

Now I need to find the time to have a look at these data!