A quick summary for the impatient:
The sff files from the E coli Ion Torrent runs released by EdgeBio show much longer raw reads than the trimmed reads in the corresponing fastq/fasta files. The quality of those extra bases, however, is very low. This shows the potential for longer reads from the Ion Torrent platform.
The sff file released by Ion Torrent through their Dev Community site has these extra bases masked, which makes one wondering what if are trying to hide something…
Part 1: EdgeBio’s data
When EdgeBio released six runs with E coli DH10B Ion Torrent data (see http://www.edgebio.com/blog/?p=191), I decided to have a look inside the sff files they provided. I downloaded the data from data.edgebio.com, and used Roche’s sffinfo command to ‘peek inside’. The sffinfo command, accompanying the 454 Life Science software suite, will list the content of the binary sff file in text format (see the post on my other blog). Other, open source/access tools, such the ones my mention on my blog, might do this as well.