Posts Tagged ‘memory’

The true cost of object creation in java

Tuesday, December 6th, 2011

I’ve been spending some time trying to optimise the data loading part of one of my java projects.  The nature of the data we use means that we have to create hundreds of millions of objects, each of which internally stores only a single long value (it actually stores several fields packed into this value using a bitmask since this is more memory efficient).

When loading our data we are therefore parsing hundreds of millions of long values and creating the associated objects.  This can take a few minutes to complete, and having profiled the code it seems that it is the object creation which slows everything down.  I therefore did some tests to work out exactly how slow the creation of objects is relative to the primitives which exist in java.  My test code is below:

public class CreateTest {

    public static void main (String [] args) {
        long start = System.currentTimeMillis();

        long [] primitives = new long[50000000];
        for (int i=0;i<primitives.length;i++) {
            primitives[i] = i;
        }

        long end = System.currentTimeMillis();

        System.out.println("Making 50 million longs took "+(end-start)+"ms");

        start = System.currentTimeMillis();

        Long [] objects = new Long[50000000];
        for (int i=0;i<primitives.length;i++) {
            objects[i] = new Long(i);
        }

        end = System.currentTimeMillis();

        System.out.println("Making 50 million Longs took "+(end-start)+"ms");
    } 
}

It’s reasonable to think that there will be an overhead for object creation, but I was surprised by the results:

Making 50 million longs took 199ms
Making 50 million Longs took 10809ms

So that’s a 50-fold overhead for the object wrappers around these numbers.  What’s worse is that this overhead seems to happen inside the JVM in such a way that you can’t take advantage of multi-threading to get around it.  I tried refactoring the code to have 5 threads creating 10million reads each, and the total runtime across 5 cores was pretty much exactly the same as doing the same thing on a single core.  This means that if you want to have 50 million objects available in your program then you’re just going to have to wait 10 seconds for them, however many cores you want to throw at the problem.

I also investigated other options for object creation. Namely I made my object cloneable and then used clone() to create new instances rather than calling the constructor.  The constructors for my object are very lightweight, so it was disappointing, but not surprising to see that this had no appreciable effect on the time taken for object creation.

I’ve even toyed with the idea of just storing these objects as an array of longs and avoiding this overhead all together.  I could still extract the relevant data by using a set of static methods, but what I can’t then do is to sort these objects (which I need to) since there’s no way to do a custom sort in java without putting the data into objects (which would defeat the point).

I’m therefore stuck with the biggest bottleneck in my program being something which I know is able to be improved by 50X (and would then make everything hugely quick), but not within the confines of the java language.

Tags: , ,
Posted in Computing | Comments Off


Getting the java heap size you asked for

Friday, August 26th, 2011

In a recent post I discussed a method we’re using for automatically setting the java heap size appropriately at runtime. It now turns out that the issue of setting the heap size is complicated by the fact that the heap size you request on the command line isn’t necessarily what you get given. In some cases the differences are modest, but sometimes they can be significant – amounting to hundreds of megabytes of discrepancy.

The simple test I did was to compare the heap size requested by setting the -Xmx value on the java command with the actual amount of available memory as reported by Runtime.getRuntime().maxMemory().  What I found was that the relationship between these two values isn’t 1:1, isn’t fixed at a given ratio, and is platform (and indeed VM) dependent.

According to this bug report the actual implementation of -Xmx is VM-dependent, so that the value you supply on the command line is merely a suggestion to the VM and it’s free to do whatever it likes.  Because I’d like my software to work consistently on all platforms I therefore had a look at what the different VMs actually do.

The OSX VM actually stays very close to the requested amount of memory across the whole range of requested heap sizes.  The linux and windows VMs though overcommit at small heap sizes (there seems to be a minimum allowed heap size of ~10MB), but undercommit by up to 12% at larger heap sizes.  When you’re requesting a heap of serveral gigabytes in size a 12% loss is a significant amount of memory.

Our immediate solution to this problem is to do a trial run where we launch a small program which reports the actual heap size allocated.  We then relaunch the normal java command, increasing the heap request size by a correction factor calculated from the trail run.  This seems to produce consistent results on all platforms and gives us what we asked for in the first place.

Tags: , ,
Posted in Computing | Comments Off