You were a pioneer, the first successful ‘next generation’ (if you’ll pardon the term) commercially available sequencing platform in 2005. You just beat Solexa, but it was a fairly close call.Your greatest accomplishment was to show that pyrosequencing, which was around for a while already, could be scaled up, both in terms of read length and parallelisation. You started the revolution in DNA sequencing, suddenly making large scale genomic projects available to labs that traditionally only could dream of been doing such projects at this scale.
As before, full run throughput in gigabases (billion bases) is plotted against single-end read length for the different sequencing platforms, both on a log scale. Yes, I know a certain new instrument (different from last time) seems to be missing, hang on, I’m coming back to that…
Notable changes from the June 2015 edition
- I added the Illumina MiniSeq
- I added the Oxford Nanopore MinION. The read length for this instrument was based on the specifications for maximal output and number of reads from the company’s website. The two data points represent ‘regular’ and ‘fast’ modes.
- I added the IonTorrent S5 and S5XL. You may notice that the line for this instrument has a downward slope, this is due to the fact that the 400 bp reads are only available on the 520 and 530 chip, but not the higher throughput 540 chip, making the maximum throughput for this read length lower than for the 200 bp reads.
I’m done. No more. From now on, for me it is Open Science, or nothing. I will no longer do closed science.I did my PhD in the late 1990’s, and was educated in the ‘classical mode’ of doing science: do your work, get it published in a journal with an as high as possible impact factor, write grants or apply for a postdoc to do more such science. As a result, articles that resulted from my PhD work appeared in closed access journals.
Fast forward more than 10 years, and science is changing, opening up: Open Access is gaining traction, Open Source software is what I use every day, Open Data is trending. And I am sold, have been for a few years now. Open Science is the best way to guarantee scientific progress, to spread knowledge fast, and to stimulate collaboration and reduce competition. To use the famous quote:
The opposite of ‘open’ isn’t closed. The opposite of open is ‘broken.’ – John Wilbanks
But, opening up for me has been hampered by working in a research environment that is very slow in its uptake of Open Science. The lure of the Impact Factor is still very much present, as is the fear of being scooped by early sharing of data. I can’t really blame any of my colleagues, it has been their way of doing science as long as they can remember. They will stick to this way of pursuing science unless the scientific reward system changes, from its reliance on publications in high Impact factor journals needed for grants, which again are needed for a career, towards recognising the true value and impact of a scientist’s contribution towards advancing knowledge. But for me, it is enough now. I am opening up.
I have been pretty open already, posting my presentations to slideshare, making posters and illustrations available through figshare, putting scripts and teaching material on github, posting my peer review reports to publons and blogging about all of this. But there is still so much of what I do that is hidden, closed, unavailable to interested colleagues and potential collaborators. This ends now. I am going to open fully.
I used to think it was no use taking this step, as I am not an independent scientist, I do not have my own funding or my own lab. Rather, I work together with many others or (co)supervise students on different projects. So I didn’t really feel I ‘owned’ the research, the data or the code being produced, and felt I was not in a position to open it up. I also feared I would close doors by using openness as a criterion for my participation in a project. But I have now realised that it is too much a matter of principle for me, that if I want a career in science, it has to be on my conditions, that is, open. Whatever the consequences.
Others before me have taken this step, and in true reuse-what-is-open spirit, I have made my own version of Erin McKiernan’s Open plegde.
My pledge to be open:
- I will not edit, review, or work for closed access journals
- I will blog my work and post preprints, when possible
- I will publish only in open access journals
- I will pull my name off a paper if coauthors refuse to be open
- I will share my code, when possible
- I will share my raw and processed data, when possible
- I will practice open notebook science, when possible
- I will share my review reports
- I will speak out about my choices
Adapted from McKiernan, Erin (2015): Open pledge.
I will work towards retroactively applying the pledge to current collaborations and papers that are underway, where possible. For new collaborations, being able to adhere to the pledge will be a condition of my participation. New grants proposals I am invited into as collaborator will have to contain clauses that make it clear I can adhere to the pledge while participating in the research upon funding.
I am excited to fully join the Open Science movement, and look forward to what it will bring.
As part of my training to become an instructor-trainer for Software and Data Carpentry, I want to help further develop the material used during instructor training workshops. Greg Wilson, who heads the instructor training, and I, decided to make some videos to demonstrate good and not-so-good practices when teaching workshops. Greg recently released his “example of bad teaching” video focussing on general teaching techniques.
For my contribution, I wanted to demonstrate as many aspects as I could of what I wrote in my “10 tips and tricks for instructing and teaching by means of live coding” post.
So here was the plan:
- make two 2-3 minute videos with contrasting ways of doing a live coding session
- one demonstrates as many ways as possible how to not do this
- one uses as many good practices as possible
- during the instructor-training workshop, participants are asked (in small groups) to discuss the differences and their relevance.
With help from colleague Tore Oldeide Elgvin (the cameraman) and local UiO Carpentry organisers Anne Fouilloux and Katie Dean (playing the role of learners), we recorded the videos. It took about two hours and a dozen attempts, but it was fun to do. Amazing how difficult it is to not doing your best while teaching…
Here are the videos – watch them before you read on about what they were supposed to show. Note that (part of) the unix shell ‘for-loop’ lesson is what is being taught. It is assumed the instructor has already explained shell variables (when to use the ‘$’ in front and when not).
Many thanks to Tore, Anne and Katie for helping out making these videos!
- instructor ignores a red sticky clearly visible on a learner’s laptop
- instructor is sitting, mostly looking at the laptop screen
- instructor is typing commands without saying them out loud
- instructor uses fancy bash prompt
- instructor uses small font in not full-screen terminal window with black background
- the terminal window bottom is partially blocked by the learner’s heads for those sitting in the back
- instructor receives a a pop-up notification in the middle of the session
- instructor makes a mistake (a typo) but simply fixes it without pointing it out, and redoes the command
- instructor checks if the learner with the red sticky on her laptop still needs attention
- instructor is standing while instructing, making eye-contact with participants
- instructor is saying the commands out loud while typing them
- instructor moves to the screen to point out details of commands or results
- instructor simply uses ‘$ ‘ as bash prompt
- instructor uses big font in wide-screen terminal window with white background
- the terminal window bottom is above the learner’s heads for those sitting in the back
- instructor makes mistake (a typo) and uses the occasion to illustrate how to interpret error-messages
In March 14-18 2016 we organised the first Carpentry week at the University of Oslo. After a mini-Seminar on Open Data Skills, there was a Software Carpentry workshop, two Data Carpentry workshops and a workshop on Reproducible Science as well as a ‘beta’ Library Carpentry workshop.
The Software and Data Carpentry effort at the University of Oslo, aka ‘Carpentry@UiO’, really started in 2012 when I invited Software Carpentry to give a workshop at the university. The then director, Greg Wilson, came himself and gave an inspirational workshop – recruiting Karin Lagesen and I to become workshop instructors in the process. Karin and I graduated from instructor training spring 2013 and have been giving a couple of workshops in Oslo and elsewhere.
Teaching in general, and at Software and Data Carpentry workshops in particular, gives me great pleasure and is one of the most personally rewarding activities I engage in. With Software Carpentry, I feel I belong to a community that shares many of the same values I have: openness, tolerance, a focus on quality in teaching to name a few. The instructor training program is the best pedagogical program I know of, and it is amazing to see how Software and Data Carpentry are building a community of educators that are fully grounded in the research on educational practices.
Being an instructor is my way of making a small, but hopefully significant, contribution to improving science, and thus the world.
This testimonial can also be found here.
I get asked about this a lot, so I thought to put together a quick blog post on it.
Disclaimer: this is the advice I usually give people and is given without warranty. As they say, Your Mileage May Vary.
Main advice: bite the bullet and get the budget to get 100x coverage in long PacBio reads. 50-60x is really the minimum. Detailed advice:
Sequencing and assembly
- get 100x PacBio latest chemistry aiming for longest reads (make sure provider has SAGE Blupippin or something similar)
- get 100x HiSeq paired end regular insert
- run PBcR on the PacBio reads, this is part of Celera. It corrects the longest raw reads, assembles them using Celera (long run time). Make sure to install the latest Celera release which uses the much faster MHAP approach for the correction.
- alternative is FALCON https://github.com/PacificBiosciences/FALCON
- run quiver for polishing the assembly using ALL raw PacBio reads, see tips here
- you could repeat the polishing if that changes a lot of bases and does not negatively impact validation
- polish using the HiSeq reads with Pilon
- increase contiguity using BioNanoGenomics data
- create pseudo chromosomes using a linkage map (software?)