Introduction In the current Illumina pipeline raw sequence data is generated in qseq files, but can optionally be converted to the more standard FastQ format for use with other analysis programs. The FastQ files produced are uncompressed text files and take up a considerable amount of space in our storage system. We’ve therefore been thinking…
- Home
- Archive by category "Bioinformatics"
- (Page 2)
Adding custom chromosome name mappings into SeqMonk
When loading data into SeqMonk the program has to try to connect the chromosome names used in your data file with those which are present in the genome one which your project is based. In many cases there won’t be an exact match between the two – many mapping programs report file names in their…
Interpreting the duplicate sequence plot in FastQC
Background The one analysis module which seems to elicit more questions than any other is the duplicate sequence plot. Of all of the plots which the program generates it’s probably the one which causes the most warnings / errors in otherwise nice looking data. I’m happy to admit that it’s not always immediately obvious what…
Published:May 23, 2011 View Post
How good is ‘good enough’ for research software
There are two linked problems which seem to face me with every piece of software I write for research use: When is the software complete enough to write a paper on it How to manage the versions and project description I think that although similar questions arise within software written for general use, their answers…
Where do you analyse next gen sequence data?
We had an interesting discussion at the Bioinfo-core workshop at ISMB2010. The discussion centred around the best way to handle the logistics of making sequence data available to using a sequencing service. The problem is that the data is so big that even if you have a large central store you run the risk of…
Mapping Bisulphite Converted Sequence
I’ve been thinking lately about the best way to construct a mapping pipline for large sequence datasets which have been bisulphite converted. Bisulphite conversion is mostly used to detect DNA methylation, although other uses are also being found. The basic principle is that treating DNA with sodium bisulphite modifies cytosine bases such that when they…
Managing Really Large Data Sets
For a while now I’ve been working with next generation sequencing datasets. Each dataset consists of around 10 million mapped genome positions, and an experiment can consist of 10 or more datasets. When analysing this data memory usage is a major issue. Up until now our approach has been to try to store everything in…
Scientific Instrument Software
As a bioinformatician I find myself spending too much of my time working around poor software supplied with scientific instruments. I’m continually amazed that hardware which can cost hundreds of thousands of pounds is very often let down by the control and analysis software supplied with it. I suspect that the fault for this lies…