In the application I’m working on I have to deal with large amounts of data, which means handling tens of millions of java objects. One of the biggest problems we face is keeping the memory usage of the program under control.
The main object which accounts for the vast majority of the memory consumption contains only a few fields. These are:
- A reference to a different object
- An int to specify a start
- An int to specify and end
- An int to act as a flag (which only has 3 values)
To try to improve the memory profile of this object I tried changing the flag value to use a byte instead of an int, and also storing a start and a length (as a short) instead of an end value (as an int).
Making these changes I was surprised to find that the memory usage of the application didn’t change at all. I didn’t see how this could be, since I was using smaller data structures so I went looking for details of how java allocates memory. I was surprised at the results!
Basically, most class level variables in java consume the same amount of memory (4 bytes), except for longs and doubles which consume 8 bytes. This means that it doesn’t matter if your variable is a boolean or an int it still uses the same amount of memory. If you put a series of variables into an array then there is an overhead for using the array in the first place, but the individual members of that array are packed efficiently such that an array of bytes would take less memory than an array of ints of the same length.
In our case I tried out a few different options for combining the 3 ints at the core of our object to try to save memory. I took the standard 3 int version of the class as the basis for comparison.
- As stated above moving to an int, a short and a byte used exactly the same amount of memory, and slowed the application down slightly as more calculation had to be done to extract the end value each time.
- Using a single long and bitshifting to fit an int, short and byte into it did save memory, but is not as efficient as I’d hoped since there is still one unused byte in the long – and I’m only using 3 possible values out of the range which I could encode in the byte.
- Using a byte array to store 7 bytes (not wasting the extra byte used in the long) was much less efficient than any of the other solutions.
We’ve therefore gone down the route of packing data into a single long, which is saving ~20% of the previous memory usage, at the expense of some overhead during the packing/unpacking. The packing overhead is pretty minimal though, adding only 2 seconds on a test which took around a minute in the original code.
Doing the packing/unpacking is fairly straightforward using the bitshifting operators in java. In the following excerpt packedPosition is a long, start is an int, length is a short and strand is a byte.
packedPosition = start; packedPosition += (long)length<<32; packedPosition += (long)strand<<48;
Getting the data back out is also fairly simple. For example, extracting the length would be achieved by doing:
We’re now looking for other cases where adding this extra complexity is worth the memory savings we can achieve.