When Nick, Josh and I got together at the Newton Institute hackathon in 2014 we wanted to see if assembling nanopore data was possible. Over the week we put together a pipeline inspired by pbdagcon which used DALIGNER and poa to correct sequencing errors in nanopore reads. This software, which we called nanocorrect, improved the accuracy of our nanopore reads to around 97% after two rounds of correction. The corrected reads worked well in the Celera Assembler and we had very long contigs soon after the hackathon.
To complete our de novo assembly paper we wrote a second, much more powerful, software package called nanopolish that uses the nanopore signal data to improve the accuracy of the assembly. Nanopolish has since grown to support aligning signal events to a reference genome and calling SNPs. Where nanocorrect was a simple, quick solution to error correction, nanopolish is a long term project that I’ve spent most of my time on in the last year.
Nanocorrect’s major flaw is that it is very slow (specifically the poa step which constructs a partial order alignment between sets of overlapping reads). In the year since we published our paper a few other nanopore-compatible assemblers have appeared, namely canu and miniasm. These programs are clearly better at building contigs than our nanocorrect + CA pipeline so we’ve decided to deprecate nanocorrect. Our current suggestion is to build contigs with canu and use nanopolish to compute the final consensus sequence.