We recently hit a problem when trying to run a fairly simple script through grid engine. The script used an internal redirection to merge together STDIN and STDOUT before doing a final transformation on the result. A simplified version of what the script did is something like this: echo hello 2>&1 | sed s/^/prefix:/ …
- Home
- Archive by category "Computing"
Fast subset selection by row name in R
Introduction One of the best features about R is the simple way you can use a number of different strategies to create subsets of large data tables. The basic selection mechanisms you have are that you can subset a data frame by providing: A set of column or row indices A set of boolean values…
Why is java not found on the command line after I’ve installed it?
The Problem A common problem we have reported from users of SeqMonk is that when they try to launch the program on a Windows system they get an error message saying that java could not be found, even though they have installed this and they can show that it works fine in their browser. When…
Generating R reports with vector images from markdown with knitr
Introduction One really nice addition to a standard R environment is the ability to create reports which combine R code, comments and embedded graphical output. The original mechanism for doing this was Sweave, but more recently a second system called knitr has emerged which seems to be more flexible, and this is what I’ve been…
Syncing calendars between iOS and android
Introduction Having been happily using a combination of OSX and iOS for a few years now I’ve recently expanded my device collection with a google nexus 7 tablet. I’ve been using a separate IMAP based email account for a while so this was easy to set up on the new device, but I also wanted…
The true cost of object creation in java
I’ve been spending some time trying to optimise the data loading part of one of my java projects. The nature of the data we use means that we have to create hundreds of millions of objects, each of which internally stores only a single long value (it actually stores several fields packed into this value…
Moving over to Casava 1.8
Introduction Illumina have recently released an updated version of their downstream analysis software CASAVA. This is the analysis pipeline which runs after the sequencer has processed the raw data down to base call files and provides a variety of functionalities from creating usable base calls to alignment and variant calling. Casava 1.8 makes some major…
Getting the java heap size you asked for
In a recent post I discussed a method we’re using for automatically setting the java heap size appropriately at runtime. It now turns out that the issue of setting the heap size is complicated by the fact that the heap size you request on the command line isn’t necessarily what you get given. In some…
Mac application bundle caching
Having spent a frustrating hour or so trying to update a mac application bundle I thought I’d share a couple of things which caused no end of confusion and aren’t what you’d expect and are therefore likely to catch out those working with application bundles for the first time. Basically I was finding that although…
Dynamically setting the java heap size at runtime
One of the oddities about java programs is that they require you to set a maximum heap size when you start the program. What this means in effect is that you need to be able to predict the memory usage of your program before it starts, and whatever heap size you set needs to be…