Posts Tagged ‘java’

The true cost of object creation in java

Tuesday, December 6th, 2011

I’ve been spending some time trying to optimise the data loading part of one of my java projects.  The nature of the data we use means that we have to create hundreds of millions of objects, each of which internally stores only a single long value (it actually stores several fields packed into this value using a bitmask since this is more memory efficient).

When loading our data we are therefore parsing hundreds of millions of long values and creating the associated objects.  This can take a few minutes to complete, and having profiled the code it seems that it is the object creation which slows everything down.  I therefore did some tests to work out exactly how slow the creation of objects is relative to the primitives which exist in java.  My test code is below:

public class CreateTest {

    public static void main (String [] args) {
        long start = System.currentTimeMillis();

        long [] primitives = new long[50000000];
        for (int i=0;i<primitives.length;i++) {
            primitives[i] = i;
        }

        long end = System.currentTimeMillis();

        System.out.println("Making 50 million longs took "+(end-start)+"ms");

        start = System.currentTimeMillis();

        Long [] objects = new Long[50000000];
        for (int i=0;i<primitives.length;i++) {
            objects[i] = new Long(i);
        }

        end = System.currentTimeMillis();

        System.out.println("Making 50 million Longs took "+(end-start)+"ms");
    } 
}

It’s reasonable to think that there will be an overhead for object creation, but I was surprised by the results:

Making 50 million longs took 199ms
Making 50 million Longs took 10809ms

So that’s a 50-fold overhead for the object wrappers around these numbers.  What’s worse is that this overhead seems to happen inside the JVM in such a way that you can’t take advantage of multi-threading to get around it.  I tried refactoring the code to have 5 threads creating 10million reads each, and the total runtime across 5 cores was pretty much exactly the same as doing the same thing on a single core.  This means that if you want to have 50 million objects available in your program then you’re just going to have to wait 10 seconds for them, however many cores you want to throw at the problem.

I also investigated other options for object creation. Namely I made my object cloneable and then used clone() to create new instances rather than calling the constructor.  The constructors for my object are very lightweight, so it was disappointing, but not surprising to see that this had no appreciable effect on the time taken for object creation.

I’ve even toyed with the idea of just storing these objects as an array of longs and avoiding this overhead all together.  I could still extract the relevant data by using a set of static methods, but what I can’t then do is to sort these objects (which I need to) since there’s no way to do a custom sort in java without putting the data into objects (which would defeat the point).

I’m therefore stuck with the biggest bottleneck in my program being something which I know is able to be improved by 50X (and would then make everything hugely quick), but not within the confines of the java language.

Tags: , ,
Posted in Computing | Comments Off


Getting the java heap size you asked for

Friday, August 26th, 2011

In a recent post I discussed a method we’re using for automatically setting the java heap size appropriately at runtime. It now turns out that the issue of setting the heap size is complicated by the fact that the heap size you request on the command line isn’t necessarily what you get given. In some cases the differences are modest, but sometimes they can be significant – amounting to hundreds of megabytes of discrepancy.

The simple test I did was to compare the heap size requested by setting the -Xmx value on the java command with the actual amount of available memory as reported by Runtime.getRuntime().maxMemory().  What I found was that the relationship between these two values isn’t 1:1, isn’t fixed at a given ratio, and is platform (and indeed VM) dependent.

According to this bug report the actual implementation of -Xmx is VM-dependent, so that the value you supply on the command line is merely a suggestion to the VM and it’s free to do whatever it likes.  Because I’d like my software to work consistently on all platforms I therefore had a look at what the different VMs actually do.

The OSX VM actually stays very close to the requested amount of memory across the whole range of requested heap sizes.  The linux and windows VMs though overcommit at small heap sizes (there seems to be a minimum allowed heap size of ~10MB), but undercommit by up to 12% at larger heap sizes.  When you’re requesting a heap of serveral gigabytes in size a 12% loss is a significant amount of memory.

Our immediate solution to this problem is to do a trial run where we launch a small program which reports the actual heap size allocated.  We then relaunch the normal java command, increasing the heap request size by a correction factor calculated from the trail run.  This seems to produce consistent results on all platforms and gives us what we asked for in the first place.

Tags: , ,
Posted in Computing | Comments Off


Dynamically setting the java heap size at runtime

Friday, July 29th, 2011

One of the oddities about java programs is that they require you to set a maximum heap size when you start the program. What this means in effect is that you need to be able to predict the memory usage of your program before it starts, and whatever heap size you set needs to be appropriate for all of the system the program is going to run on and all of the datasets it will handle.

When you’re distributing a desktop application which needs to process tens of gigabytes of data this can be a problem.  Ideally you’d like to set a heap size of a few gigabytes to give yourself enough overhead to process even the largest of datasets.  However not all machines have that much RAM installed, and even if they do they require a 64 bit OS to be able to use more than 2GB of it on a single process.

Up until now we’ve resorted to setting a lowest common denominator heap size (1500M), and providing instructions for increasing this on systems which can handle it.  This is however very inelegant and means we have to warn users if they’re running out of memory and make them save, reconfigure and restart the application.

We have now moved over to using a system which dynamically sets the heap size to an appropriate value at runtime.  We do this by writing a Perl wrapper which launches the java application after having worked out the most appropriate heap size for the current system.

To do this we have to work out:

  1. Whether we have a 32 or 64 bit JRE to work with
  2. The amount of physical RAM in the machine

The heap size is set as 2/3 of the physical RAM which leaves enough overhead for the JRE and basic system processes. In addition we set a ceiling on the heap size.  For 32 bit systems this is 1500M which is the most you can practically use given the 2GB per process limit (you have to leave something for the JRE itself). For 64 bit systems we set the ceiling at 6GB. It’s not in our interest to set the heap size too high as this ends up resulting in long freezes during garbage collection, so we set it to the largest size we’re practically going to need.  We work out if we’re on a 64-bit system by parsing the output of java -version (it doesn’t matter if the OS is 64 bit if the JRE is still 32 bit).

Finding the amount of phyiscal RAM is a platform specific task. On windows we have to use the Win32 API. Under linux we parse the output of ‘free’, and on OSX and the BSDs we parse the output of top.

If the user doesn’t like our auto-configured heap size we allow them to override it by passing a -m argument to the wrapper script.

For unix-like OSs we don’t need to worry about perl being present, but for windows we compile the script into a windows binary using pp.  The Win32::API module is bundled into this binary.  On other platforms we don’t need to distribute this since it’s loaded dynamically at runtime only if perl’s $^O variable tells us we’re running under windows.  Under OSX we can run the wrapper nicely from the command line.  We’re still working out the best way to include this as part of an application bundle.

This isn’t perhaps the cleanest of solutions, but compared to the very manual process we had before it’s a lot easier for the users to get their systems set up optimally.

#!/usr/bin/perl
use warnings;
use strict;
use English;
use FindBin qw($RealBin);
use Getopt::Long;
use IPC::Open3;

my @java_args;
# See if they manually set a heap size
my $memory;

my $result = GetOptions(
                        'memory=i' => \$memory,
                         );

if ($memory) {
    if ($memory < 500) {
        die "Memory allocation must be at least 500M";
    }
}
else {
    $memory = determine_optimal_memory();
}
unshift @java_args,"-Xmx${memory}m";

exec "java",@java_args, "uk.ac.bbsrc.babraham.SeqMonk.SeqMonkApplication";

sub print_error {

    # We wrap errors like this so we can keep a windows shell window open
    # so the user can see any errors we generate

    my ($error) = @_;

    warn $error;

    $_ = <STDIN>;

    exit 1;
}

sub determine_optimal_memory {

    # We'll set a ceiling for the memory allocation.  On a 32-bit OS this is going
    # to be 1500m (the max it can safely handle), on a 64-bit OS we won't take more
    # than 6GB
    my $max_memory = 1500;

    # We need not only a 64 bit OS but 64 bit java as well. It's easiest to just test
    # java since the OS support must be there if you have a 64 bit JRE.

    my ($in,$out);
    open3(\*IN,\*OUT,\*OUT,"java -version") or print_error("Can't find java");
    close IN;
    while (<OUT>) {
        if (/64-Bit/) {
            $max_memory = 6000;
        }
    }
    close OUT;

    warn "Memory ceiling is $max_memory\n";

    # The way we determine the amount of physical memory is OS dependent.
    my $os = $^O;

    my $physical;
    if ($os =~ /Win/) {
        $physical = get_windows_memory($max_memory);
    }
    elsif ($os =~/darwin/ or $os =~ /bsd/i) {
        $physical = get_osx_memory($max_memory);
    }
    else {
        $physical = get_linux_memory($max_memory);
    }

    warn "Raw physical memory is $physical\n";

    # We then set the memory to be the minimum of 2/3 of the physical
    # memory or the ceiling, whichever is lower.
    $physical = int(($physical/3)*2);

    if ($max_memory < $physical) {
        return $max_memory;
    }

    warn "Using $physical MB of RAM to launch seqmonk\n";
    return $physical;

}

sub get_linux_memory {
    # We get the amount of physical memory on linux by parsing the output of free

    open (MEM,"free -m |") or print_error("Can't launch free on linux: $!");

    while (<MEM>) {
        if (/^Mem:\s+(\d+)/) {
            return $1;
        }
    }

    close MEM;

    print_error("Couldn't parse physical memory from the output of free");
}

sub get_osx_memory {

    # We get the amount of physical memory on OSX by parsing the output of top

    open (MEM,"top -l 1 -n 0 |") or print_error("Can't get amount of memory on OSX: $!");

    my $total_mem = 0;

    while (<MEM>) {
        if (/^PhysMem:.*?(\d+)M\s+used,\s+(\d+)M\s+free/) {
            $total_mem += $1;
            $total_mem += $2;
        }    
    }

    close MEM;

    unless ($total_mem) {
        print_error("Could't parse physical memory from the output of top");
    }

    return $total_mem;

}

sub get_windows_memory {

    warn "Getting windows physical memory\n";

    # This code was adapted from an answer posted by Tom Feiner on
    # stackoverflow
    #
    # http://stackoverflow.com/questions/423797/how-can-i-find-the-exact-amount-of-physical-memory-on-windows-x86-32bit-using-per

    my ($max_memory) = @_;

    eval {
        require Win32::OLE;
        Win32::OLE->import qw( EVENTS HRESULT in );
        1;
    } or do {
        print_error("Couldn't load Win32 module to determine windows memory");
    };

    my $WMI = Win32::OLE->GetObject( "winmgmts:{impersonationLevel=impersonate,(security)}//./" ) || print_error ("Could not get Win32 object: $OS_ERROR");
    my $total_capacity = 0;

    foreach my $object (in($WMI->InstancesOf( 'Win32_PhysicalMemory' ))) {
        $total_capacity += $object->{Capacity};
    }

    my $total_capacity_in_mb = $total_capacity / (1024*1024);

    return $total_capacity_in_mb;
}

Tags: ,
Posted in Computing | 1 Comment »


Choosing the best format for raw sequence data

Thursday, June 16th, 2011

Introduction

In the current Illumina pipeline raw sequence data is generated in qseq files, but can optionally be converted to the more standard FastQ format for use with other analysis programs.  The FastQ files produced are uncompressed text files and take up a considerable amount of space in our storage system.  We’ve therefore been thinking about either compressing or converting these files to save on the amount of storage they require.

At the same time we’ve also been expanding the range of compression schemes supported in FastQC which gives us a good impression of how quickly we can extract data from the different available formats and since I’ve collected some data on the storage and processing requirements for the different formats available I thought I’d share these to help inform others who may be making similar decisions.

The choices we have are to simply compress the fastq files with a standard compression scheme.  The two most commonly used are gzip and bzip2.  Many analysis programs support gzipped fastq files as input in addition to uncompressed files, and a few are starting to support bzip2.  Bzip2 is often chosen over gzip because it compresses data more efficiently.  The other choice is to put the raw data into a BAM file.  This format is specifically designed to hold high throughput sequence data and uses a compression scheme which is designed to be optimal for sequence data.  The BAM format was primarily designed to hold information about sequences which had been mapped to a reference sequence, but it also allows for raw sequences with no associated mapping to be stored, but with some overhead for the mapped position fields which are not used.

For the tests I used a fastq file containing around 500,000 reads of 33bp in length.  The processing times were taken as the time to process the file completely with FastQC.  Since the processing overhead for the QC analysis should be the same in all cases any differences will be attributable to the different amount of data needing to be read from disk, and the CPU time required for the decompression.  The tests were run on a MacBook Pro laptop (so not the fastest hard drive, or the speediest CPU).  FastQC uses pure java decompression code.  For gzip compression this is built into the JRE and for bzip2 I used the jbzip2 library.

Results

File type File size Time to process (seconds)
Uncompressed FastQ 69.8MB 14.1
Gzip Compressed FastQ 17.5MB 11.0
Bzip2 Compressed FastQ 13.9MB 72.1
BAM 16.3MB 11.5

Conclusions

It is clear that converting your raw FastQ files to a more efficient storage format will produce significant gains in disk space usage.  Reducing your storage requirements by a factor of 4-5X and actually making your processing more efficient in some cases is a win-win proposal.

From the results presented it seems clear that unless disk usage is critically important then bzip2 compression is not a viable solution. Increasing your processing time by over 500% for a 20% reduction in size does not seem to be a good trade off.  I’ve also checked using the command line gzip/bzip2 decompression utilities in case the effect I saw was an artefact of the java implementations, but the size of the difference between the two was similar there as well.

Choosing between gzip and BAM is less clear.  It’s probably fair to say that at the moment more analysis programs support gzipped fastq files as input than support BAM files (which is more normally seen as an output format), but this may change in the future.  BAM files offer the prospect of adding in mapping data alongside your sequence data with a minimised increase in filesize which may be a benefit to some.  If you’d prefer to keep your raw data separate from derived data then gzipped FastQ files would seem to be the better choice.

In our case we’re going to opt for simply gzipping our FastQ files since this seems to be a simple process which won’t affect any of our existing workflows or processing and which will return to us a significant amount of storage space.

Tags: , ,
Posted in Bioinformatics, Computing | Comments Off


Sound Level Monitoring in Java

Tuesday, October 6th, 2009

For a project I’ve been working on I needed what I thought should be fairly easy to make – a simple widget to monitor an input sound level.  I’d never worked with the full javax.sound API before, but assumed that there would be ample documentation to do what I needed.

Having started on the project things turned out to be a bit more tricky than I thought.  Most of the examples I found revolve around passing sound from an input to an output, and the sound API makes this fairly easy.  What proved to me more tricky was the intercepting and processing of a sound input.  I’ll therefore go through what I did to make this work in the end.

Let’s start with the basics:

What you need at the end of the day is a TargetDataLine object.  The nomenclature of the javax.sound API is somewhat unusual in that a SourceDataLine is a sound output line (such as a set of speakers), whereas a TargetDataLine is a sound input line (such as a microphone).

You have two choices for getting a TargetDataLine, you can either go through a rigorous process of exploring the capabilities of all of the sound lines on your system and find some way to select the best match to what you want.  Alternatively you can decide on the type of line you want and request that from the sound system.  If this type of line is not available then you will get a LineUnavailableException.

Having tried both approaches I ended up plumping for the second option.  Since I didn’t need high precision and wasn’t asking anything difficult of my sound source I could select conservative properties for my line and trust that nearly all sound cards would be able to support this.  In practice this code has worked on every machine I’ve tried it on, but conceivably it could fail on some sound cards.

You need two things to get a TargetDataLine.  An AudioSystem and an AudioFormat object.  The AudioSystem class provides a series of methods through which you can access the components of the audio system on your machine.

An AudioFormat object defines the characteristics of the sound stream you want to obtain.  There are a few different parameters you must set and there are a range of acceptable values which will be supported by the majority of sound cards:

  • Sample rate: This says how many times per second the sound will be sampled
  • Sample size: The number of bits in each sample taken
  • Number of channels: Whether this input is mono or stereo
  • Signed: Whether the sound samples are signed or unsigned
  • Endianness: For multi-byte sample sizes says whether the byte order is big or little endian

Since I was only interested in monitoring a line level I chose fairly conservative values:

  • Sample Rate: 8000.  This is about the lowest commonly used sample rate.  CD quality sound is 44.1kHz, but this is overkill for a simple sound monitor
  • Sample Size: 16 (bits).  You can also use 8 bit samples depending on the resolution you are after.
  • Channels: 1 My interest was in overall sound level so mono sound was OK
  • Signed: true.  This is the more common option, and is easier to deal with in java since all java primitives are signed.
  • Endianness: Since my samples are 16 bit then byte order matters.  Most common audio formats (eg WAV) are little endian as is the x86 architecture, so this is the common choice.

In practice this means that to get a TargetDataLine I can do something like this:

 AudioFormat audioFormat = getAudioFormat();
 targetDataLine = (TargetDataLine) AudioSystem.getTargetDataLine(audioFormat);
 private AudioFormat getAudioFormat(){
   float sampleRate = 8000.0F;
   int sampleSizeInBits = 16;
   int channels = 1;
   boolean signed = true;
   boolean bigEndian = false;
   return new AudioFormat(sampleRate,sampleSizeInBits,channels,signed,bigEndian);
 }

Once I have my TargetDataLine I then need to activate it before I can read any data from it.  Activating the line is as simple as:

 targetDataLine.open();
 targetDataLine.start();

Once the line is open you can begin to read data from it.  The usual way to do this is in a separate thread where you have a buffer which you fill with data from the line and then process.  A read from a line will block until the buffer is full, so the size of your buffer will determine how often you process the data.

The amount of data produced will be a function of your sample rate, sample size and number of channels.  If you have one channel and 16 bit samples at 8kHz then every second you will produce 8000 * 16 * 1 bits of data or 16000 bytes of data.  Therefore a 16000 byte buffer will be filled every second, and 8000 byte buffer will fill in half a second.

Reading the input can therefore be done is a loop as follows:

byte [] buffer = new byte[2000];
while (true) {
  int bytesRead = targetDataLine.read(buffer,0,buffer.length);
}

The remaining problem is then how to process the filled buffer to get the overall level.  There are a few options for this, you could work out the average sound level over the whole sample, or you could work out the peak level throughout the sample.  I chose the latter method, but the basic process would be the same in either case.

Since my samples were 16bit I had to take into account that each sample was composed of two bytes, and I needed to recombine these into a signed short before processing them.  This involves using a bit shifting operation to combine the two bytes, and taking into account the little endian byte order.

short max;
if (bytesRead >=0) {
 max = (short) (buffer[0] + (buffer[1] << 8));
 for (int p=2;p<bytesRead-1;p+=2) {
   short thisValue = (short) (buffer[p] + (buffer[p+1] << 8));
   if (thisValue>max) max=thisValue;
 }
 System.out.println("Max value is "+max);
}

For an 8 bit sample I could use the individual bytes directly.  For a big endian sample the p and p+1 positions in the generation of the short would have to be reversed.

Using this method I can now sample the input line at any rate I choose to get the raw data from which a peak meter could be produced.  The final thing to remember is that our perception of sound should always be viewed on a log scale.  If you are creating a sound meter you should therefore log transform the data before plotting to gain a more realistic view of the sound level.

Tags: , , ,
Posted in Computing | Comments Off


Exporting SVG from Java

Sunday, February 8th, 2009

Several programs I’ve written in Java have had an image export componet to them.  Up until now the export has always only been as a bitmap image.  This is very easy to do using a BufferedImage and an ImageWriter.  Given a Component (all AWT or Swing widgets) and a file you can create a PNG very simply:

  BufferedImage b = new BufferedImage(c.getWidth(),c.getHeight(),BufferedImage.TYPE_INT_RGB);
  Graphics g = b.getGraphics();
  c.paint(g);
  ImageIO.write((BufferedImage)(b),"PNG",file);

I’ve always thought though that it should be possible to create an SVG file from an arbitrary component, but never got round to trying it.  However I got snowed in last week and decided to give it a go, and it turned out to be somewhat easier than I thought.

There is a natural fit between the abstract methods of the AWT Graphics class and the primitive components in the SVG spec – they even use the same coordinate system.  Creating an SVG export class therefore simply requires implementing the SVG spec and translating the method calls to SVG code.

Although the basic premise is fairly straightforward – there were a few gotchas which had be scratching my head!

  1. The Graphics class has a create() method which returns a Graphics object.  Initially I was just returning the same object each time and this seemed to work.  With more complex nested objects though the coordinates were getting messed up.  I worked out that it was important that each Graphics instance keeps track of its own translate coordinates as these are managed separately in each instance.
  2. Translate coordinates are relative and not absolute.  When a new set of coordinates are passed via the translate() method these must be added to the current translation, and don’t replace it.
  3. Fonts are problematic.  Size is fine, but font names are not trivially converted between the java font name and something SVG can understand.  It may be possible to figure out some generic rules or use a translation table, but since my applications all use a default sans-serif font of varying sizes I’ve hardcoded this for now.
  4. Some methods, such as font metric creation are a bit of a pain to write – so I’ve cheated and created an unused BufferedImage Graphics object within my class of the appropriate size and pass any difficult method calls to that – I can then discard it at the end.  This is slightly wasteful of memory, but is a quick fix.
  5. I coded my initial implementation on a Mac and it was all working.  I then tried it on Windows and Linux and got empty SVG files.  I resorted to a question on StackOverflow to figure out what was going on, and it turned out that the double buffering mechanism was causing the problems.  If this was enabled then all I saw in my Graphics class was a single call to drawImage() with a complete bitmap in it.  This was passed directly from the offscreen buffer, but meant I couldn’t intercept the individual shape method calls.  I therefore now use the RenderManager to disable double buffering on the component before calling its paint method which ensures nothing comes between my class and the Graphics methods calls.
  6. Lots of code seems to assume that the object which gets passed to paint always implemets Graphics2D.  I put a conditional check into my code for this, but it might be a good idea to implement (with null methods) the extra Graphics2D methods if this was to be used genrically.

At the end of all of this I now have a class which implements a single static method of:

public static String convertToSVG (Component c);

Which has worked in every instance I’ve tried.  There are some methods I’ve not implemented – namely the paintImage methods and some of the drawPolygon methods.  The polygon methods should be easy enough.  The paintImage may be more problematic, but I don’t feel too bad leaving these out since there’s not much advantage to using SVG if you’re just going to stick a bitmap into it.

The SVG code will be in the 0.3 release of SeqMonk and I may release it as a stand alone package if anyone shows any interest in it.


                

Tags: , ,
Posted in Computing | Comments Off


Swing Speedup!

Friday, October 10th, 2008

I’d never really been one to subscribe to the popular perception that swing is so horrifically slow that it isn’t suitable for use in any serious application.  Most of the problems I’ve seen with swing based programs have resulted from people not knowing how to make processing tasks run in a separate thread and therefore causing the interface to block until they complete.

For some reason I seem to have got into a habit of writing java applications which operate on enormous datasets and which stretch swing to its limits.  For the most part it has coped very well, but this last week provided me with a display where the lag in swing was noticable.

Although the display didn’t update very often I did need to repaint to highlight one part of the screen and the lag from clicking to the display changing was around a second – which was too long.  All of the time was spent in the swing redraw code (Sun’s part – not mine!).

Bizarrely when I moved the program onto my mac the interface was very snappy – no lag at all, and it was the same on my linux box, so there was something odd about the windows drawing code.

A while ago I remembered reading that Sun had added OpenGL 2D acceleration into swing.  Since this seemed a good time to look into this I read some of the Sun documentation about this.  To cut a long story short OpenGL is present from java 1.5 onwards, but is disabled by default.  To enable it you simply need to add the following as a JRE argument:

-Dsun.java2d.opengl=true

On my application this had a dramatic effect.  The redraws which previously took around a second were now virtually instant.  I’ve since applied this to a couple of other large applications and all have shown a noticable increase in responsiveness.

It’s not every day you come across a magic bullet, but this came pretty close as far as I was concerned.

Tags: , ,
Posted in Computing | Comments Off