Why charset names are not constants?

serg picture serg · Nov 5, 2009 · Viewed 71.3k times · Source

Charset issues are confusing and complicated by themselves, but on top of that you have to remember exact names of your charsets. Is it "utf8"? Or "utf-8"? Or maybe "UTF-8"? When searching internet for code samples you will see all of the above. Why not just make them named constants and use Charset.UTF8?

Answer

Kevin Bourrillion picture Kevin Bourrillion · Nov 5, 2009

The simple answer to the question asked is that the available charset strings vary from platform to platform.

However, there are six that are required to be present, so constants could have been made for those long ago. I don't know why they weren't.

JDK 1.4 did a great thing by introducing the Charset type. At this point, they wouldn't have wanted to provide String constants anymore, since the goal is to get everyone using Charset instances. So why not provide the six standard Charset constants, then? I asked Martin Buchholz since he happens to be sitting right next to me, and he said there wasn't a really particularly great reason, except that at the time, things were still half-baked -- too few JDK APIs had been retrofitted to accept Charset, and of the ones that were, the Charset overloads usually performed slightly worse.

It's sad that it's only in JDK 1.6 that they finally finished outfitting everything with Charset overloads. And that this backwards performance situation still exists (the reason why is incredibly weird and I can't explain it, but is related to security!).

Long story short -- just define your own constants, or use Guava's Charsets class which Tony the Pony linked to (though that library is not really actually released yet).

Update: a StandardCharsets class is in JDK 7.