Why java has "String" type and not "string"?

Ravi Gupta picture Ravi Gupta · Jan 21, 2010 · Viewed 19.1k times · Source

Wrapper class are just fine and their purpose is also well understood. But why do we omit the primitive type ?

Answer

Robert Fraser picture Robert Fraser · Jan 21, 2010

It depends what you mean by "primitive"

"Primitive" in Java is usually taken to mean "value type". However, C# has a string keyword, which acts exactly the same as Java's String, it's just highlighted differently by the editor. They are aliases for the classes System.String or java.lang.String. String is not a value type in either language, so in this way it's not a primitive.

If by "primitive" you mean built into the language, then String is a primitive. It just uses a capital letter. String literals (those things in quotes) are automatically converted to System.String and + is used for concatenation. So by this token, they (and Arrays) are as primitive as ints, longs, etc.

First, what is a String?

String is not a wrapper. String is a reference type, while primitive types are value types. The means that if you have:

int x = 5;
int y = x;

The memory of x and y both contain "5". But with:

String x = "a";
String y = x;

The memory of x and y both contain a pointer to the character "a" (and a length, an offset, a ClassInfo pointer, and a monitor). Strings behave like a primitive because they're immutable, so it's usually not an issue, however if you, say, used reflection to change the contents of the string (don't do this!), both x and y would see the change. In fact if you have:

char[] x = "a".toCharArray();
char[] y = x;
x[0] = 'b';
System.out.println(y[0] == 'b'); // prints "true"

So don't just use char[] (unless this is the behavior you want, or you're really trying to reduce memory usage).

Every Object is a reference type -- that means all classes you write, every class in the framework, and even arrays. The only things that are value types are the simple numeric types (int, long, short, byte, float, double, char, bool, etc.)

Why isn't String mutable like char[]?

There are a couple reasons for this, but it mostly comes down to psychology and implementation details:

  • Imagine the chaos you'd have if you passed a string into another function and that function changed it somehow. Or what if it saved it somewhere and changed it in the future? With most reference types, you accept this as part of the type, but the Java developers decided that, at least for strings, they didn't want users to have to worry about that.
  • Strings can't be dealt with atomically, meaning multithreading/synchronization would become an issue.
  • String literals (the things you put in your code in quotes) might be immutable at the computer's level1 (for security reasons). This could be gotten around by copying them all into another part of memory when the program starts up or using copy-on-write, but that's slow.

Why don't we have a value-type version of a string?

Basically, performance and implementation details, as well as the complexity of having 2 different string types. Other value types have a fixed memory footprint. An int is always 32 bits, a long is always 64 bits, a bool is always 1 bit, etc.2 Among other things, this means that they can be stored on the stack, so that all parameters to a function live in one place. Also, making gigantic copies of strings all over the place would kill performance.

See also: In C#, why is String a reference type that behaves like a value type?. Refers to .NET, but this is just as applicable in Java.

1 - In C/C++ and other natively-compiled languages, this is true because they are placed in the code segment of the process, which the OS usually stops you from editing. In Java, this is actually usually untrue, since the JVM loads the class files onto the heap, so you could edit a string there. However, there's no reason a Java program couldn't be compiled natively (there are tools which do this), and some architectures (notably some versions of ARM) do directly execute Java bytecode.

2 - In practice, some of these types are a different size at the machine level. E.x. bools are stored as WORD-size on the stack (32 bits on x86, 64 bits on x64). In classes/arrays they may be treated differently. This is all an implementation detail that's left up to the JVM -- the spec says bools are either true or false and the machine can figure out how to do it.