I attended, for the first time, the Advances in Genome Biology and Technology (AGBT) meeting in Florida. With this post, I intend to summarise my experiences of the meeting. I will not cover everything that happened at the meeting, but focus of the areas of my own interest.
First, I have already dedicated one post to one particular talk, the one by David Jaffe on the first data from the Oxford Nanopore MinION.
Here, I will add a few additional reflections on the MinION talk. I already alluded to the fact that the E. coli sequenced was a methylation negative strain. Someone I spoke to said to know that the other species, Scardovia, also did not have methylated bases. This may indicate that methylated bases confuse the sequencing process. From methylation detection using the PacBio platform, we know that the signal from several bases upstream and downstream of the modified base is different then in the absence of base modifications. It is speculative, but perhaps the MinION can not (yet) sequence modified DNA.
Another aspect that was not touched upon by David Jaffe was the per- MinION throughput. The data presented allow for calculating a number of bases available for each strain, with around 6x in reads for the 5 Mbp E coli strain totalling 30 Mbp, and 13x coverage of the 1.6 Mbp Scardovia genome totalling 21 Mbp. But we don’t know how many MinIONs were used for the data presented. Nor was anything said about a filtering step (by quality or length) of the raw reads before they were sent to Jaffe. So, as of yet, we do still not really know how many bases to expect from a MinION run.
Finally, it remains to be seen how evenly the genome is covered and whether there us any bias against (high or low) GC regions. With the exception of the PacBio, all sequencing platforms show significant biases against extreme GC regions, hampering recovery of those regions. It is important to determine how the MinION performs in this respect.
As I’ve written, our application for the MinION Access Program was not granted. I have no problem with that as the program was massively oversubscribed, and Oxford Nanopore had to make a selection amongst the applicants. One blogger, however, is not agreeing and has set up a petition to ask Oxford to give me the possibility to test out the MinION after all. I can’t really object to that initiative, although I would fully understand if even a massive amount of signatures would not convince Oxford Nanopore – why grant me access, while there are so many others that did not get into the program.
So, that was it for announcements from new sequencing platforms at AGBT. Oh, wait, almost forgot: there was actually another one. Drumroll… Genapsys announced their GENIUS system! Who? What? Well, I missed the talk, but as far as I can tell from twitter and the buzz at the meeting, I didn’t miss much. The presenter showed what was described as a lunch box sequencer. No data was shown. I’ll leave it at that. For those interested, see this Forbes piece (with some nice comments) and this GenomeWeb article.
There were representatives from two other nanopore platforms at the meeting. I spend some time talking to Arek Bibillo at his poster describing the Genia system, a biological pore which measures molecules cleaved off nucleotides as they become incorporated during DNA synthesis. The platform has the advantage of having one molecule at the time in the pore (instead of multiple bases) and that the signal goes to background levels in between reading. I think the platform is promising, although they have not yet come very far. Apparently, the chips scale very well with a 100 000 pore chip coming, and one with another factor of 10 more pores planned.
Finally, I had a quick chat with Quantum biosystems’ Nava Whiteford, a former Oxford Nanopore employee. They recently released the first data of their electronic nanopore. The data consisted of raw signals from reading a 21 mer miRNA. Quantum is really early stage, but I applaud their being so open that they release raw data at this stage. More data releases were promised.
A quick recap of what the big established sequencing platforms were presenting at AGBT:
Illumina had already wowed us at the JPMorgan conference (Genomeweb piece) and did not have much new to add to the news on the NextSeq 500 and HiSeq X Ten. This lead to little buzz at AGBT.
Ion Torrent did not have any breakthroughs to report. In fact, I was disappointed to hear new planned release dates for the Ion Proton PII chip (early access May/June) and Ion Chef (early access ongoing), later than what we were told in October (the PII chip was then to be released in November 2013, Ion Chef before Christmas 2013).
PacBio can now truly be called an established platform. They had a massive presence with many talks and posters, including ours (which you can see here). The company’s biggest news was the release of 54 x coverage of PacBio data from the human genome, and a corresponding assembly that outperforms the current reference. Have a look at PacBio’s blog for details. Every sequencing company ultimately wants to show that they can sequence the human genome. PacBio has in that respect in my opinion outperformed them all, as they not only have the data, they have a fantastic de novo assembly. This means that de novo genome sequencing is going through a transformation: with the right funding and DNA amounts and qualities, very high quality assemblies are now possible, reaching and often surpassing the golden standard from before.
Jason Chin, chief bioinformatician at PAcBio, gave a presentation on his ongoing work towards a true diploid assembler. Mixing PacBio reads of two inbred, but different, samples of Arabidopsis to mimic a diploid species, he showed promising results of the Falcon assembler. Fully resolved, heterozygous (phased) assemblies are becoming possible!
Talking of assembling PacBio reads, the currently availble way of using the reads is to either generate around 60-100x coverage so the reads can be used to correct themselves, or using 50x or more short read data for correction of the raw PacBio reads. The corrected reads are then used in an assembly. The ultimate goal is obviously to use the raw reads natively, assembling them without correction. This is challenging due to the high single-pass error rate, making finding true overalls difficult. Besides our poster on that subject, there was an interesting talk from assembly guru Gene Myers (from Celera fame). He basically skipped the whole short-read era, but now with the long PacBio reads, he’s back: developing a new assembler, called Dazzler (for Dresden Assembler). The program can assemble small and large genomes from around 30x raw data in fairly fast times with promising results. Gene Myers told that he plans to release the software in a couple of months. Jim knight, the develop of Newbler (and a former student of Gene Myers) had a poster showing the first results of using raw PacBio reads in a hybrid assembly with Newbler. Very exciting developments, as this may particularly be helpful for large genomes, for which generating high coverage PacBio datasets is very expensive.
Roche/454, well, after their announcement that they will shut down 454 in mid 2016, the buzz on this platform was very quiet. The only thing I picked up was that the long reads from the GS FLX+ are coming to the GS Junior.
I liked very much the talk by Jeffery Schloss, entitled “Ambitious Goals, Concerted Efforts, Conscientious Collaborations – 10 Years Hence”. He described ten years of history of the National Human Genome Research Institute’s program to enable and support technological developments towards the $1000 genome. Interestingly, he mentioned that in the beginning, the goal was a $1000 genome of the quality of the mouse genome that was published in 2002, implying a de novo assembled genome. These days, the $1000 genome only refers to resequencing the sample to say 30x. Schloss did not touch upon when, and why, this change happened. Dale Yuzuki has a blog post on the talk.
Talking about references: I am very interested in augmented reference genomes, where alternative sequences (different haplotypes) are represented such that the reference becomes more complete. There were two talks describing different alternative approaches towards this goal. Valerie Schneider, from the National Center for Biotechnology gave a talk on “Taking Advantage of GRCh38”, the newly released human reference. EDIT: link to her slides. For this release, many more alternative alleles are added. Valerie Schneider described new read mappers that take the alternatives into account, leading to better SNP callings. Another approach was described by Mark Garrison, in his talk about graph-based reference representations. Using prior information of variants, an extended reference can be build, again improving mapping results. I am very excited about these developments, as I hope this will help us represent the extensive variation we see in Atlantic cod.
Finally, my interest was peaked by two (or rather, three) technologies to take genome assemblies to a higher level. Beyond sequencing and assembly, there is a step to transform the scaffolds to chromosome size reconstructions.
Joshua Burton gave a talk titled “Chromosome-Scale Scaffolding of de novo Genome Assemblies Based on Chromatin Interactions”. He described using a technique to use Hi-C data, where chromosomal regions that are in close proximity get cross-linked, isolated and sequenced. The resulting data can be used to determine the order and orientation of scaffolds (the method has been published recently).
Another approach to the same problem is using optical mapping. There individual, very long DNA molecules are labelled at restriction sites, and imaged, and the patterns used to make longer ‘contigs’. Mapping the restriction sites present on scaffolds can then used to link up the assembly scaffolds with the optical map. The technique can also be used for detecting structural variations. At AGBT, BioNanoGenomics performed a mapping experiment of the human genome in real time at the meeting. I talked to a representative and was quite impressed with the technology.
Nabsys is a company developing a different technical solution to the same problem, with an electronical readout of the labels. They do not yet have an instrument for sale, but what they showed looked promising, not least because they can achieve a higher tag density (0.5 – 1kb for NabSys, something like 10kb for BioNanoGenomics).
Notably absent at AGBT was OpGen, a direct competitor of BioNanoGenomics.
I feel these mapping approaches are going to be a very valuable addition to genomics, both for super scaffolding assemblies, as well as for cost-effective structural variation detection.
All in all, I enjoyed the meeting very much. I’ve (re)met many people, learned a lot, had good discussions and – I can’t deny it – had a lot of fun.