Does java -Xmx1G mean 10^9 or 2^30 bytes?

nealmcb picture nealmcb · Sep 30, 2015 · Viewed 13.9k times · Source

And in general, are the units used for the -Xmx, -Xms and -Xmn options ("k", "M" and "G", or the less standard possibilities "K", "m" or "g") Binary prefix multiples (i.e. powers of 1024), or are they powers of 1000?

The manuals say they represent kilobytes (kB), megabytes (MB) and gigabytes (GB), suggesting they are powers of 1000 as defined in the original SI system. My informal tests (that I'm not very confident about) suggest they are really kibibytes (kiB), mebibytes (MiB) and gibibytes (GiB), all powers of 1024.

So which is right? E.g. what Java code would show the current size?

Using multiples of 1024 is not surprising for RAM sizes, since RAM is typically physically laid out by doubling up hardware modules. But using units in a clear and standard way is ever more important as we get to bigger and bigger powers, since the potential for confusion grows. The unit "t" is also accepted by my JVM, and 1 TiB is 10% bigger than 1 TB.

Note: if these really are binary multiples, I suggest updating the documentation and user interfaces to be very clear about that, with examples like "Append the letter k or K to indicate kibibytes (1024 bytes), or m or M to indicate mebibytes (1048576 bytes)". That is the approach taken, e.g., in Ubuntu: UnitsPolicy - Ubuntu Wiki.

Note: for more on what the options are used for, see e.g. java - What are the Xms and Xmx parameters when starting JVMs?.

Answer

Boann picture Boann · Sep 30, 2015

Short answer: All memory sizes used by the JVM command line arguments are specified in the traditional binary units, where a kilobyte is 1024 bytes, and the others are increasing powers of 1024.

Long answer:

This documentation page on the command line arguments says the following applies to all the arguments accepting memory sizes:

For example, to set the size to 8 GB, you can specify either 8g, 8192m, 8388608k, or 8589934592 as the argument.

For -Xmx, it gives these specific examples:

The following examples show how to set the maximum allowed size of allocated memory to 80 MB using various units:

-Xmx83886080
-Xmx81920k
-Xmx80m

Before I thought to check the documentation (I assumed you already had?), I checked the source of HotSpot and found the memory values are parsed in src/share/vm/runtime/arguments.cpp by the function atomull (which seems to stand for "ASCII to memory, unsigned long long"):

// Parses a memory size specification string.
static bool atomull(const char *s, julong* result) {
  julong n = 0;
  int args_read = sscanf(s, JULONG_FORMAT, &n);
  if (args_read != 1) {
    return false;
  }
  while (*s != '\0' && isdigit(*s)) {
    s++;
  }
  // 4705540: illegal if more characters are found after the first non-digit
  if (strlen(s) > 1) {
    return false;
  }
  switch (*s) {
    case 'T': case 't':
      *result = n * G * K;
      // Check for overflow.
      if (*result/((julong)G * K) != n) return false;
      return true;
    case 'G': case 'g':
      *result = n * G;
      if (*result/G != n) return false;
      return true;
    case 'M': case 'm':
      *result = n * M;
      if (*result/M != n) return false;
      return true;
    case 'K': case 'k':
      *result = n * K;
      if (*result/K != n) return false;
      return true;
    case '\0':
      *result = n;
      return true;
    default:
      return false;
  }
}

Those constants K, M, G are defined in src/share/vm/utilities/globalDefinitions.hpp:

const size_t K                  = 1024;
const size_t M                  = K*K;
const size_t G                  = M*K;

All this confirms the documentation, except that support for the T suffix for terabytes was apparently added later and is not documented at all.

It is not mandatory to use a unit multiplier, so if you want one billion bytes you can write -Xmx1000000000. If you do use a multiplier, they're binary, so -Xmx1G means 230 bytes, or one stick o' RAM.

(Which is not really surprising, because Java predates the IEC's attempt to retroactively redefine existing words. Confusion could have been saved if the IEC had merely advised disambiguating the memory units with the qualifiers "binary" and "decimal" the occasional times their meaning wasn't clear. E.g., binary gigabytes (GB2) = 10243 bytes, and decimal gigabytes (GB10) = 10003 bytes. But no, they redefined the words everyone was already using, inevitably exploding confusion, and leaving us stuck with these clown terms "gibibyte", "tebibyte" and the rest. Oh God spare us.)