I’ve been thinking lately about the best way to construct a mapping pipline for large sequence datasets which have been bisulphite converted. Bisulphite conversion is mostly used to detect DNA methylation, although other uses are also being found. The basic principle is that treating DNA with sodium bisulphite modifies cytosine bases such that when they…
Managing Really Large Data Sets
For a while now I’ve been working with next generation sequencing datasets. Each dataset consists of around 10 million mapped genome positions, and an experiment can consist of 10 or more datasets. When analysing this data memory usage is a major issue. Up until now our approach has been to try to store everything in…