Instructor training at the 2017 Data Intensive Biology Summer Institute at UC Davis

[Adapted from Titus Brown’s blog post]

Titus Brown has been so kind as to invite me to co-instruct this week-long workshop (thanks!). So I thought to make a bit of a commercial for it:

Are you interested in

  • Getting started with, or getting better at, teaching the Analysis of High Throughput Sequencing Data
  • Hands-on training in good pedagogical practice
  • Becoming a certified Software/Data Carpentry instructor
  • Learning how to repurpose and remix online training materials for your own needs

… then this one-week workshop is for you!

When: June 18-June 25, 2017 (likely we’ll only use Monday-Friday).
Where: University of California, Davis, USA
Instructors: Karen Word, C. Titus Brown, and Lex Nederbragt

This workshop is intended for people interested in teaching, reusing
and repurposing the Software Carpentry, Data Carpentry, or Analyzing High Throughput Sequencing Data materials. We envision this course being most useful to current teaching-intensive faculty, future teachers and trainers, and core facilities that are developing training materials.

Attendees will learn about and gain practice implementing evidence-based teaching practices. Common pitfalls specific to novice-level instruction and bioinformatics in particular will be discussed, along with associated troubleshooting strategies. Content used in prior ANGUS workshops on Analyzing High Throughput Sequencing Data will be used for all practice instruction, and experienced instructors will be on hand to address questions about implementation.

Attendees of this workshop may opt to remain at the following ANGUS two-week workshops so that they can gain hands-on experience in preparing and teaching a lesson.

This week-long training will also serve as Software/Data Carpentry Instructor Training.

Attendees should have significant familiarity with molecular biology and basic experience with the command line.

We anticipate a class size of approximately 25, with 3-6 instructors.

The official course website is here.

Apply here!

Applications will close March 17th.

The course fee will be $350 for this workshop. On campus housing may not be available for this workshop, but if it is, room and board will be approximately $500/wk additional (see venue information). (Alternatives will include local hotels and Airbnb.)


If you have questions, please contact dibsi.training@gmail.com.

The Roche/454 Life Sciences GS FLX and GS Junior: an obituary

You were a pioneer, the first successful ‘next generation’ (if you’ll pardon the term) commercially available sequencing platform in 2005. You just beat Solexa, but it was a fairly close call.

gsflx-uio1

The author (left) with colleagues showing off their 454 GS FLX

Your greatest accomplishment was to show that pyrosequencing, which was around for a while already, could be scaled up, both in terms of read length and parallelisation. You started the revolution in DNA sequencing, suddenly making large scale genomic projects available to labs that traditionally only could dream of been doing such projects at this scale.

Continue reading

Developments in high throughput sequencing – July 2016 edition

This is the fifth edition of this visualisation, previous editions were in June 2015, June 2014, October2013 and December 2012.

As before, full run throughput in gigabases (billion bases) is plotted against single-end read length for the different sequencing platforms, both on a log scale. Yes, I know a certain new instrument (different from last time) seems to be missing, hang on, I’m coming back to that…

developments_in_high_throughput_sequencing

Notable changes from the June 2015 edition

  • I added the Illumina MiniSeq
  • I added the Oxford Nanopore MinION. The read length for this instrument was based on the specifications for maximal output and number of reads from the company’s website. The two data points represent ‘regular’ and ‘fast’ modes.
  • I added the IonTorrent S5 and S5XL. You may notice that the line for this instrument has a downward slope, this is due to the fact that the 400 bp reads are only available on the 520 and 530 chip, but not the higher throughput 540 chip, making the maximum throughput for this read length lower than for the 200 bp reads.

Continue reading

Developments in high throughput sequencing – June 2015 edition

This is the fourth edition of this visualisation, previous editions were in June 2014, October 2013 and December 2012.

As before, full run throughput in gigabases (billion bases) is plotted against single-end read length for the different sequencing platforms, both on a log scale. Yes, I know a certain new instrument seems to be missing, hang on, I’m coming back to that…

Continue reading

A hybrid model for a High-Performance Computing infrastructure for bioinformatics

I work for the Norwegian High-Throughput Sequencing Centre (NSC), but at the Centre for Ecological and Evolutionary Synthesis (CEES). At CEES, numerous researchers run bioinformatic analyses, or other computation-heavy analyses, for their projects. With this post, I want to describe the infrastructure we use for calculations and storage, and the reason why we chose to set these up the way we did.

In general, when one needs high-performance compute (HPC) infrastructure, a (group of) researcher(s) can purchase these and locate them in or around the office, or use a cloud solution. Many, if not most, universities offer a computer cluster for their researchers’ analysis needs. We chose a hybrid model between the universitys HPC infrastructure and setting up one ourselves. In other words, our infrastructure is a mix of self-owned, and shared resources that we either apply for, or rent.

Continue reading

On graph-based representations of a (set of) genomes

In 1986, in a letter to the journal Nature, James Bruce Walsh and Jon Marks lamented that the upcoming human genome sequencing project “violates one of the most fundamental principles of modern biology: that species consist of variable populations of organisms”. They further wrote: “As molecular biologists generally ignore any variability within a population, the individual whose haploid [sic] genome will be chosen will provide the genetic benchmark against which deviants are determined”. They conclude that ” ‘the’ genome of ‘the’ human will be sequenced gel by acrylamide gel”.

We have come a long way when it comes to taking population variation into account in molecular/genetic/genomic studies. But these sentiments, expressed already in 1986, echo some of the trends in the human genetics field: the move away from a single, linear representation of ‘the’ human genome. In this post I will provide some background, explain the reasons for moving towards graph-based representations, and indicate some challenges associated with this development.

Continue reading

Thoughts on a possible Assemblathon3

The Genome10K meeting is ongoing (I am not attending but following through twitter). Today, there will be a talk by Ian Korf about the feasibility of an Assemblathon 3 contest (see this tweet and the schedule). Earlier the @Assemblathon twitter account asked for a wishlist for an Assemblathon 3 through the hashtag #A3wishlist. With this post I want to share my opinion on what a possible Assemblathon 3 could and/or should be about.

Continue reading

My review of “MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island”

Earlier this week, the first paper was published describing the use of Oxford Nanopore MinION data to solve a biological question. The paper, entitled “MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island” came out in Nature Biotechnology (ReadCube link).

I was a reviewer for this manuscript. I have posted my two (signed) review reports on publons. As data and code were made available by the authors (as it should be), I made a (mostly successful) effort to reproduce the computational part of the paper. After I was done with the review report of the second version I could not help myself to have a further look at some of the results. This led to me sending some plots to the authors, and one of these plots ended up becoming figure 1. This was a lot of fun to see in the final version.

Below are some excerpts of the review reports.
Continue reading

My review of “Long-read, whole-genome shotgun sequence data for five model organisms”

Two days ago, a paper appeared in Nature Scientific Data by Kristi Kim et al, titled “Long-read, whole-genome shotgun sequence data for five model organisms”. This paper describes the release of whole-genome PacBio data by Pacific Biosciences and others, for five model organisms, Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster, using quite recent chemistries.

Beyond the datasets described in the paper, Pacific Biosciences also released whole-genome data for the human genome, and very recently, for Caenorhabditis elegans using the latest P6/C4 chemistry. Check out PacBio devnet, also for data for other applications.

I think it is fantastic that Pacific Biosciences releases these datasets as a service to the community – and obviously to showcase their technology. Company-generated data often represents the best possible data, as it is done by people with very much experience with the technology. It remains to be seen if ‘regular’ owners of PacBio RS II instrument can reach the same level of data quality. Nonetheless, these datasets are very helpful for teaching (see my previous blog post), comparisons with other technologies (I wish a I could make time to throughly compare PacBio data to Moleculo data available from the same species), as well as development of new software applications.

Continue reading

Our review of “Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data”, aka the HGAP paper

As it is out in the open that I was one of the reviewers of the ‘HGAP’ paper, I though I could as well make my review publicly available.

I have posted the review report (from February 2013) online at Publons. The review was actually done together with a PhD student in the group, Ole Kristian Tørresen (I like to do reviews together with others, it leads to better reviews and is a great learning experience for students!).

Here are the first few paragraphs. Enjoy!

Continue reading