Each sequencing company has a workhorse genome they sequence a lot. PacBio sequences the lambda virus, Illumina uses PhiX. Both Ion Torrent and 454 use E. coli DNA, but while Ion Torrent takes E. coli K12 substr. DH10B, 454 chose E. coli K12 substr. MG1655.
I am interested in finding out to what extent Ion reads can replace the much more expensive 454 reads for de novo genome assembly (my field of speciality). Currently, the Ion read length is too short for the technology to be competitve, but this might change later this year, when (if …) the promised 400 bp reads become a reality.
In my comparisons of Ion data with 454 reads, I have always been hampered by the the fact that the strains the platforms use as test samples were not completely identical. Luckily for me, today Ion Torrent released (behind the Ion Community login) a dataset on E coli MG1655, a run with ID BEL-335. Imagine my joy! (Saves me from having our centre generate one on our – yet to be unpacked- PGM). The data is from a 314 chip, has 468 thousand reads and 54 Mbp raw data. That represents around 11x coverage of the MG166 genome. This is a bit too low for what I ideally would like to have (around 30 x), but alright. I set out to try this data in a de novo assembly using newbler, together with an equivalent data set of 454 reads.