Over the last few years I’ve had to review tens of software application papers. I’ve also been involved in writing a few of these (and have also conciously decided not to write papers for some of our software due to the amount of hassle involved). In many ways software papers are some of the worst…
- Home
- Archive by category "Bioinformatics"
Key points for developing training
The the ISMB workshop for Education in bioinformatics Gabriella Rustici gave an interesting talk on her “Top 10 tips for setting up a bioinformatics training course”. I agreed with pretty much all of the points and she covered a lot of the types of issues raised elsewhere (make course objectives clear, use well supported software,…
Merging STDIN and STDOUT in a gridengine submission
We recently hit a problem when trying to run a fairly simple script through grid engine. The script used an internal redirection to merge together STDIN and STDOUT before doing a final transformation on the result. A simplified version of what the script did is something like this: echo hello 2>&1 | sed s/^/prefix:/ …
Fast subset selection by row name in R
Introduction One of the best features about R is the simple way you can use a number of different strategies to create subsets of large data tables. The basic selection mechanisms you have are that you can subset a data frame by providing: A set of column or row indices A set of boolean values…
A new way to look at duplication in FastQC v0.11
Introduction After a long gestation we’ll be releasing a new version of FastQC in the near future to address some of the common problems and confusions we’ve encountered in the current version. I’ll write more about this in future posts but wanted to start with the most common complaint, that the duplicate sequence plot was…
Generating R reports with vector images from markdown with knitr
Introduction One really nice addition to a standard R environment is the ability to create reports which combine R code, comments and embedded graphical output. The original mechanism for doing this was Sweave, but more recently a second system called knitr has emerged which seems to be more flexible, and this is what I’ve been…
Should you buy a nanopore sequencer?
This morning twitter is awash with posts discussing the newly announced nanopore sequencers from Oxford Nanopore. Speculation has been rife for some time about the potential specifications of the first sequencers to be produced by the company, and it certainly appears that the company have fulfilled the expectations placed upon them. I’m not going to…
Moving over to Casava 1.8
Introduction Illumina have recently released an updated version of their downstream analysis software CASAVA. This is the analysis pipeline which runs after the sequencer has processed the raw data down to base call files and provides a variety of functionalities from creating usable base calls to alignment and variant calling. Casava 1.8 makes some major…
Importing RNA-Seq data into SeqMonk
Introduction Mapped RNA-Seq data coming from eukaryotes is probably the most complicated data type to import into SeqMonk due to it’s relative complexity and the abundance of options with which you are presented. Depending on exactly what sort of information you want to know about your data different data import options will be useful, so…
Want to improve your science? Get a dog.
Actually the dog is somewhat irrelevant – it’s what comes with it which matters. One of the side-effects of dog ownership is that you get to spend an hour or so a day out walking, which means you have an hour or so with your own thoughts and no distractions. I’m sure everyone has experienced…